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ABSTRACT 

Empirical techniques are developed that may be used 
in conjunction with data stored in the Institutional Profile System 
to enhance present capabilities of assessing group structure in 
medical schools* Relevant literature is reviewed, and the 
institutionally descriptive data available for analysis and their 
manipulation into researchable formats are described* In order to 
relate the present data and methods to previous studies* variables 
similar to those used by the BAND Corporation were chosen* The data 
are factor analyzed, and the factor scores then used in two empirical 
cluster analysis procedures, and the results compared to those 
generated by RAND* Lack of replication is noted. The 10 clusters of 
medical institutions found by RAND were not found in this study. A 
variety of factors may have contributed to this conclusion, including 
a substantial number of both methodological and data differences 
between the studies* It is concluded that, at least for the present, 
categorization of medical schools by procedures accounting for 
multiple measures simultaneously does not yield clear and unambiguous 
results. This finding indicates that the picture presented by data 
institutionally descriptive of the schools is a highly complex one, 
not easily structured into a reasonably small number of groups of 
institutions* (LBH) 
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Chapter 1 



INTRODUCTION 



This report, addressing Article I, Section 5, Part 
b(l) of the "Series of Analytical Studies on Medical 
Education and Academic Health Centers" contract between 
the Bureau of Health Manpower of DHEW and the Association 
of American Medical Colleges, describes the development 
and application of several cluster analysis techniques 
to data descriptive of U.S. medical schools for the 
purpose of classifying the schools into several categories 
or groups. The tasks set forth in the contract are as 
follows : 



(a) A general classification methodology shall be 
developed to identify, parameters which 
are manifest in available data and which reflect 
commonalities or dissimilarities across insti- 
tutions. 

(b) The methodological approach shall focus upon 
developing analytic clustering methods useful 
for classifying institutions on the basis of any 
type of empirical data. Such a scheme shall 
provide a series of classifications correspond- 
ing to the particular subsets of data used. 
Data subsets so visualized will include*., 
faculty mix, student mix, and output character- 
istics. 

(c) The classification structures will then be 
cross-validated against any other quantifiable 
data representing congruent information, other 
published research in the field, and verbal 
reactions by the medical education community, if 
available. 

(d) Submit developed methodology in the form of a report. 



This report is organized as follows* A review of 
relevant literature and an overview to the present 
study is given in Chapter I* In Chapter XX, a descrip- 
tion of the empirical cluster analysis techniques chosen 
for development and application is presented* A des- 
cription of the institutionally descriptive data available 
for analysis and the manipulation of these data into 
researchable formats is given in Chapter III* The analysis 
of the data by empirical cluster analysis methods is 
presented in Chapter IV* Finally, Chapter V presents 
conclusions that may be drawn from the study and suggests 
steps for further analysis* 

A* Review of the Literature 

The need to classify medical institutions into a 
reasonably small number of groups is frequently voiced by 
those having to deal with U*S* medical schools in the process 
of policy development. There are currently 117 institutions 
in the U.S. at various stages of accreditation as medical 
schools. These 117 institutions present a diverse picture 
when viewed on institutionally descriptive measures, such 
as number and type of students, number and type of faculty, 
size and pattern of expenditures, curricula, and facilities. 
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In the absence of better schemes , schools are 
frequently classified on the basis of one or two selected 
measures- Classification by region and by type of owner- 
ship (public/private) are recurrent measures but provide 
limited insight into the real complexities of medical 
schools, A private school in the Western Region, for 
example, may be quite similar along many measurable 
dimensions to a public school in the Eastern region* 
Simple classif icatory schemes tend to ignore such similari- 
ties. The present study is an attempt to develop classi- 
ficatory methods capable of analyzing multiple measures 
simultaneously and subsequently grouping schools on the 
basis of similarities represented by those measures. 

There have been several efforts to derive classifi- 
catory schemes for U.S. medical schools on institutionally 
descriptive measures. In particular, three recent studies 
are worthy of review, 

Rodgers and Elton (1974), essentially replicating 
another study by Richards (1967), factor analyzed 14 
variables descriptive of U.S. medical schools and then, based 
on the resulting factors, compared medical schools to one 
another through a technique known as "spatial configuration, M 
Rogers identified the two factors "affluence" and "size 11 , 
also found by Richards, but noted an additional factor 
labeled "graduate emphasis," To summarize the various 
scores attained by each medical school relative to these 
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three factors t a plot of the "spatial configuration" was pro- 
vided to illustrate the proximity (similarity/dissimilarity) of 
medical schools to one another as represented in two dimensional 
space. 

The objective of the Otis study (1975) was to produce a 
general typology of U*S. medical schools for subsequent appli- 
cation in an analysis of "rates of production of differing 
types of physicians." Otis chose variables from several public 
sources and cluster analyzed related groups of them into five 
dimensions: si2e, eminence, clerkship versus basic science re- 
quirements, elective emphasis and services versus science funding. 
The individual scores for each medical school on these five 
dimensions were cluster analyzed by the BC^TRY object clustering 
routines (Tryon, 1970), producing ten medical school "types." 

The RAND Corporation in 1972 conducted an extensive study 
of ten medical schools in the U.S. for purposes of a broader 
analysis of health manpower issues. In order to ensure that the 
ten schools selected were broadly representative of the entire 
population, multivariate cluster analysis was first applied to 
six factors (linear combination of variables) to form ten groups. 
From each of these ten groups, a single medical school was then 
selected. (Keeler, et. al. , 1972.) This study used a 
methodological approach similar to that of the present study. 
As an application of the capability developed in this project, 
a classification similar to the RAND study was performed (chapter 
IV). The RAND study is descrxbed in greater detail in Chapter IV. 
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B. Overview to the Current Study 

The AAMC currently maintains in the Institutional Profile 
System a data base comprising over five thousand variables 
describing 117 U.S. medical schools. Coupled with this data 
source is a "user oriented" computer software package which 
offers a wide range of statistical and descriptive summary 
devices. This on-line system, which may be accessed through 
remote terminal sites, is intended to provide a facility for the 
exchange of information between members of the academic health 
community. It also provides a rich source of data for applied 
studies . 

The general goal of this study is to develop empirical 
techniques that may be used in conjunction with data stored 
in the Institutional Profile System to enhance present 
capabilities of assessing group structure among medical schools. 
Primarily, the intention is to provide a means to group schools 
with similar profiles on a large number of measures. It should 
be pointed out, however, that there is no one cluster solution 
which will adequately characterize medical schools for all 
purposes; different solutions will be defined by different 
needs. One of the immediate objectives of this study is, 
therefore, to develop a methodology that may be used to augment 
the inter-institutional comparative methods currently available 
to users of the Institutional Profile System. 
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A detailed description of the type of methods available 
for such work and the methods chosen for implementation in 
this study is given in Chapter II* Empirical cluster analysis 
methods fall into a general category of statistically based 
techniques that may be labeled as applied multivariate de- 
scriptive analysis procedures* Other procedures falling 
into this category are factor analysis and multidimensional 
scaling* These procedures rarely yield exact, unequivocal 
results similar, for example, to probability statements that 
come from hypotheses testing statistical procedures* Rather, 
these techniques are better viewed as procedures that reduce 
highly complex multivariate data into simpler, perhaps, more 
revealing, formats* 

On the substantive side of the study, variables on hand in 
the Institutional Profile System have been chosen and analyzed 
by empirical cluster analysis methods* An extensive set of 
variables (about 350) has been extracted from the IPS and prepared 
for research purposes* This set of data forms the basis not 
only for the present study but also for a large scale factor 
analytic descriptive study (Sherman, 1975) and a study of the 
effects of changes in class size (Sedlacek, 1975)* This data 
extraction and variable manipulation represents the first 
large scale use of the Institutional Profile System for research 
in which applied multivariate methods have been used- The 
construction of this researchable data base is described in 
Chapter III* 
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The application of empirical cluster analysis methods 
to a subset of variables from the researchable set is 
described in Chapter IV* In order to relate the present data 
and methods to previous studies, variables similar to those used 
by the RAND Corporation have been chosen* Having factor 
analyzed the data, the factor scores are then employed in 
two empirical cluster analysis procedures, and the results 
compared to those generated by RAND* Differences between 
the RAND results and results of the present study are discus- 
sed in Chapter IV* 

Chapter V presents conclusions concerning both the data 
and the methods as well as suggestions for further work* 
The cluster analysis procedures developed by this study are 
now available to users of the Institutional Profile System 
for application to selected subsets of variables* 
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Chapter II 



METHODS 



The term cluster analysis refers to a large body 
of methodological procedures designed to locate distinct 
groups of objects in which objects belonging to a group 
are in some way similar to each other but dissimilar to 
objects in other groups* The procedures available for such 
purposes range from highly subjective* judgement oriented 
methods to highly objective, statistically based methods* 
The cluster analysis procedures used for the present study 
come from the objective, statistically based end of this 
continuum, although some subjective judgements play a role 
in the results obtained* 

This chapter describes the two approaches to cluster 
analysis used in the present study* It is important to 
understand that cluster analysis techniques are widely 
diverse and serve varied objectives* The benefits of one 
technique over another is realized only in light of the 
nature of the data in question and the purpose to which the 
results are to be put* These considerations in turn contribute 
to the operational meaning of the term "cluster"* Given this 
background, the two approaches described in this chapter require 
somewhat different data attributes, and the results are inter- 
preted accordingly* 
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Strictly interpreted, a key assumption of statistically 
based clustering procedures is that all objects must be placed 
into one and only one cluster* These procedures partition 
the entire set of objects (medical schools) into mutually 
exclusive and exhaustive subsets* This partitioning may take 
place whether the data sets are completely random or highly 
structured (i*e*, whether or not there really are natural 
groupings)* Outliers (unique objects similar to no other 
object) are either included in a cluster with other objects 
or constitute clusters by themselves* This conceptual con- 
straint should be kept in mind when interpreting the results 
of any statistically based clustering procedure* 

The two approaches to cluster analysis used in this 
study are in one case "hierarchical" and in the other "non- 
hierarchical"* Each approach is described first, in general 
terms that include an illustration of typical results and then, 
in specific terms that detail the methodology chosen for this 
study* 

A* Hierarchical Clustering Schemes 

Hierarchical cluster analysis schemes generally construct 
groups of objects through a progression of stepwise merges* 
Initially, each object is considered a cluster in and of 
itself* A determination is then made as to which two clusters 
are most similar, whereupon these two clusters are merged* 
The process is then repeated until no further merge is possible* 



This process starts with n objects or clusters , yields 
n-1 clusters after the first merge, n-2 clusters after the 
second merge, etc*, until only one cluster (containing all n 
objects) remains* Hierarchical clustering schemes falling 
into this general framework have been labeled "agglomerative" 
hierarchical cluster analysis techniques** 

A feature of hierarchical, as opposed to non-hierarchical 
methods/ is that once objects are grouped together they may 
not be separated later in the process. This feature offers 
both an advantage and a disadvantage* The early decisions 
greatly rcUu'se the number of possible merges or changes that 
may take place later, thus allowing greater efficiency in 
the procedure* However, it precludes adjustment or reversal 
of unfortunate merges which have taken place earlier in the 
process* 

Specific hierarchical clustering techniques differ from 
each other primarily in the criterion used to determine the 
basis for admittance of objects into clusters* These hierarch- 
ical procedures are described in this context in sections one 
and two below* In either case, an index depicting the status 
of the merge process, as plotted against the existing number 
of clusters, will often be helpful in determining an optimal 
solution found somewhere between the two extremes of n clusters 

* There are a number of hierarchical techniques which work in 
a similar but reverse manner* They begin with a single cluster, 
containing all objects, and proceed to successively segment clusters 
into smaller and smaller groups* ,Such techniques are called 
"divisive^hierarchical cluster analysis procedures* They are not 
used in the study described in this paper. 
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(one object per cluster) and one cluster containing all 
n objects. 

1. Illustration 

To illustrate hierarchical clustering, consider the 
agglomerative procedure called the diameter method by 
Johnson (1967) . This method fits into a general class of 
methods known as complete linkage. In the diameter method, 
the basic data analyzed is an n x n matrix of euclidian 
distances, where n is the number of objects to be clustered.. 
At the first stage of clustering, the two objects with the 
smallest distance separating them are grouped together. At 
the next and all succeeding stages, an object is added to a 
cluster (note that a cluster may consist of only one object) 
only if the distance between it, the candidate object, and 
all objects within a cluster is less than its distance to 
all objects not in that cluster. 

For example, four objects have the following distance 
matrix: 



A 



B 



C 



D 



A 



0 



1.0 



2.0 



1.5 



B 



0 



0.5 



2.5 
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This matrix indicates that the distance between object A and 
object B is 1*0 units, that the distance between object B 
and object D is 2*5 units, etc* 

At the first stage in Johnson's diameter method, objects 
B and c are grouped together because they are separated 
by the smallest distance in the matrix (i*e*, they are most 
similar)* At the second stage, the distance between object A 
and object D (1*5 units) is smaller than the distance between 
A and c (2*0 units) *, therefore, neither A nor D may be added 
to the B-c cluster and are grouped together to form the second 
cluster. At the final atage, the two clusters (B-C and A~D) 
are grouped together to form one cluster containing all four 
objects* 

The merge criterion suggested by Johnson for this method 
is quite stringent and as a result produces clusters that 
are highly homogeneous* Although this characteristic may be 
beneficial under some circumstances, frequently, complete 

linkage methods are excessively constraining and fragmentary 
in their formulation of clusters. As Bailey summarizes: "com- 
plete linkage methods, * *dilate space* This means that the 
existing clusters move away from unclustered individuals as the 
clusters grow so that such individuals are more likely to form 
nuclei of new clusters than to add to pre-existing ones*" 
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Despite this criticism, it is useful to examine the 
results produced by Johnson's diameter method because the 
method is simple and easy to interpret. To illustrate both 
the application of Johnson's diameter method and the typical 
graphic summary of results that normally accompanies hierarchical 
clustering, a dendrogram (tree diagram) is presented in 
Figure 2.1, Using data on each of 99 medical institutions for 
each of 6 factor scores (see Chapter IV for description of 
the factor scores), a 99 x 99 matrix of distances has been 
generated and submitted to analysis using the Johnson diameter 
method. Results shown in the figure are for 24 of the 99 
institutions. The school names are listed on the left side 
of the page, and the critical distance for each merge is shown 
across the bottom of the^figure. The sequence in which merges 
take place is recorded on the top of the figure, 

A dendrogram may be interpreted by observing the develop- 
ment of linkages shown by series of interconnecting lines. 
Before a merge between any two schools takes place (at any 
point before merge sequence "1" is encountered) , each school 
has a single line projecting horizontally to the right. At 
this stage there exists as many clusters as there are medical 
schools (25), The first merge, depicted by a link or vertical 
connecting line, occurs between Arkansas and Louisville, The 
distance separating these two schools can be determined by 
locating the corresponding position on the scale, "Level of 
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Dendrogram: illustration of Hierarchical' 
Cluster Formulation Based on Johnson's 
Diameter Method 
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Similarity." in this case, Arkansas and Louisville are 
henceforth treated as a unit and, as such, are no longer 
denoted by individual horizontal lines but with a single 
line reflecting their joint status as members of the same 
cluster. The same steps may be applied throughout the 
dendrogram through n-1 merges, until all medical schools fall 
into one cluster. For purposes of illustration, a twenty- 
fifth merge has been included on the merge sequence scale. 
At this point in the clustering process, all 25 
medical schools listed in the present dendrogram merge as a 
single group with another such group to form one cluster. 

Since hierarchical techniques potentially present n-1 
cluster solutions, some guidance is needed in selecting an 
"optimal" solution, i.e. to answer the question: "How many 
clusters are there?" This determination will often be 
more apparent when plotting the number of clusters existing 
at any given stage against the critical distance (or whatever 
criterion is used in the merging process). Such a plot, 
for the full 99 school analysis using the Johnson "diameter" 
method, is given in Figure 2.2. 

This plot allows one to weigh the benefits of condensing 
clusters against the sacrifices to group cohension, expressed as 
critical distances r . needed to facilitate a merge. The reduction 
*in the number of clusters from 99 to 40, for instance, incurs 
only a slight relaxation in the critical distance. At the 
other extreme, merging into progressively fewer clusters entails 
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extending the critical distance disproportionately. Depending 
upon one's particular objective, an optimal solution is most 
likely found between 10 and 20 clusters. 

2* Ward T s Objective Function Technique 

The hierarchical technique chosen for the present work 
is known as Ward's Objective Function method* More general 
than Johnson's diameter method, this approach "conserves" 
rather than "dilates" space. Rather than considering individual 
similarity measures between objects in the merging decision 
process, Ward's technique uses a general function based upon 
within-groups and between-groups "sum of squares"* The 
general idea is to merge objects (or clusters) that produce the 
least increase to the within-groups or "error" sum of squares* 

More specifically, one may calculate the sum of within- 
groups squared deviations as follows: 

g n i 
SSw ^ Ll jll (Xi ^ " X i >2 

where 

SStf - within-groups sum of squared deviations 
X^j - value for the jth object in the ith cluster 
X^ mean value for the ith group 
n^ = number of objects in ith group 
g = number of groups. 

The within-group sum of squared deviations is essentially 
a measure of the collective compactness of the solution* 

21 



-18- 

Cluster solutions with groups having members with highly 
similar profiles will yield low values for the within-groups 
sum of squared deviations. 

At each stage of the process, the Ward objective 
function method merges those two objects which produce the 
least increase in SS W * Stated another way, this method 
attempts to minimize within-cluster differences while maxi- 
mizing between-cluster differences. 

One feature of the Ward method is that the centroid for 
each cluster changes after each merge takes place. Thus, 
after a merge, the values for the X f s change {if, for no 
other reason than the number of clusters is constantly 
decreasing) . This dynamic property may be viewed as both 
a benefit and a drawback to the method. On one hand, this 
property permits a more realistic approximation of the cur- 
rent composition of members within clusters. On the other 
hand, it tends to allow centroids to migrate towards out- 
lying objects that are forced into clusters by virtue of the 
mutually exclusive and exhaustive nature of the clustering 
process. The migrating centroid effect may cause objects to 
be included in existing clusters rather than to be formed into 
new ones . 

The Ward method is used extensively in the present study. 
Although the results are similar on the surface to those 

n 

from the Johnson method illustrated in Figure 2.1, it is important 
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to keep in mind that the Johnson and Ward methodologies 
are quite distinct. The Ward method was chosen because of 
its compatibility with the underlying assumptions and 
objectives of this study, and, moreover, its compatibility 
with a non-hierarchical technique also used in this study* 

B. Non-Hierarchical Clustering Schemes 

Unlike hierarchical clustering, non-hierarchical 
clustering does not develop clusters through a progression 
of step-wise merges. Instead, the user typically indicates 
the number of clusters to be formed. The non-hierarchical 
technique attempts, then, to place all objects into the specified 
number of clusters in order to optimize a given criterion. Most 
frequently, the criterion is the 3S W described above, although 
other criteria have been suggested (Friedman and Rubin, 1967)* 

After the set of objects to be clustered is initially 
broken into a desired number (specified in advance) of 
partitions and objects are assigned to groups in either a 
systematic or arbitrary fashion, non-hierarchical procedures 
then proceed to reassign objects to that cluster which most 
closely approximates the objective criterion* The procedures 
for initially partitioning and then reassigning objects in 
order to optimize a criterion provide for the variety of 
specific non-hierarchical methods that have been suggested in 
the literature (Ball and Hall, MacQueen, Forgy, Jancy, McRae, 
Friedman and Rubin) * 

23 
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1. Illustration 

To simplify the steps involved in non-hierarchical 
clustering, an illustration is provided in Figure 2.3. The 
data set used in this Example, set forth in step 1, comprises 
ten objects, lettered A through J, and two variables, X and Y. 
As pointed out above, non-hierarchical techniques often require 
an advance specification as to the number of clusters into 
which objects are to be sorted. For each cluster specified, 
the user will normally supply a seedpoint, which is merely a 
point in the measurement space around which clusters are 
expected to materialize. Since the present example is two- 
dimensional, a plot of the ten objects relative to the 
variables X and Y makes the task of specifying a suitable 
number of clusters and the location of their respective 
seedpoints considerably easier. The proximity of these 
ten objects represented in space suggests the presence of 
three clusters. They are, as outlined in 2, a group consisting of 
objects C, F, I and A, another with objects H, D, and E, and 
another with J, G, and B. The seedpoints, denoted by triangular 
marks, have been situated in such a way as to approximate 
centers of these clusters. 

The last two steps constitute an effort to improve the 
original estimates of seedpoints and to adjust the member 
composition of each cluster accordingly. How these steps 
are accomplished is essentially defined by the non-hierarchical 
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FIGURE 2.3 



An Overview of Steps Undertaken in Nonhier archie a 1 Cluster Analysis 
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algorithm itself* One approach may be first to recompute 
cluster centroids by establishing an average or central point 
for each group of objects as tentatively assigned in 2* 
All objects are then reassigned to that cluster having the 
nearest centroid* It is important to note that step 3 may 
result in significant alterations (especially without the 
benefit of advance knowledge of the data structure) in the 
original estimate of cluster centroids and membership* 
Object D/ for instance, has changed its cluster affiliation 
between steps 3 and 4* Essentially* step 4 involves a repeat 
of step 3/ again for the purpose of refinement* In this 
example* no further adjustments prove necessary beyond step 4 
because no change occurs in cluster membership after the 
centroids have been updated* 

2* Forgy's K-means Technique 

One of the earliest non-hierarchical techniques proposed 
was the K-means approach by MacQueen* With this technique* the 
user specifies the number of clusters to be generated, for 
example* g« Then the first g objects in a data set are ar- 
bitrarily taken as representing the centroids for g clusters* 
The remaining objects are considered in sequence and assigned 
to the cluster whose centroid is least distant* After each 
assignment, the cluster centroid is recalculated to reflect 
the last entry* When all objects have been assigned to groups* 
cluster centroids remain fixed* Because the original seed- 
points have been updated at each entry of an object to a cluster, 
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a final pass through the data is made in order to reassign 

objects that have become closer to other centroids. 

The final pass requires no further updating of the centroids. 

originally proposed by MacQueen, K-means is a two 
pass procedure. The first pass, just described, finds 
centroids; a second pass makes final assignments of objects to 
clusters. It is important to note that objects assigned to 
clusters on the first pass may be assigned to different 
clusters on the second pass. As indicated above/ this is one 
property that distinguishes non-hierarchical techniques from 
hierarchical techniques. 

It should also be noted that by assigning objects to 
clusters based upon smallest distances t the K-means technique 
is very similar to the Ward method which attempts to minimize 
the within-group sum of squares. MacQueen (1967) presents 
theoretical and empirical evidence of this similarity. Thus, 
the K-means procedure and Ward's objective function procedure 
share similar objectives but differ primarily in the arbitrary 
procedures designed to achieve these objectives. 

Forgy (1967) suggests modification to MacQueen 1 s basic 
K-means method in two substantial ways. First, he suggests 
that the process continue iterating as long as an objective 
function, such as the wi thin-groups sum of squares, continues 
to decrease. Secondly, he suggests that centroids not be 



ERIC 



27 



-24- 



recalculated until the end of each iteration* These two 
changes result in a non-hierarchical process similar to the 
Ward method in its formal attempts to minimize an objective 
function* The Forgy modifications, however, overcome two 
of the major problems of the Ward procedure, the permanence of 
cluster membership inherent in the hierarchical approach and 
the difficulties associated with migrating means* 

In all non^rfierarchical procedures, the specification of 
initial cluster centroids {seed points) is of great importance* 
This specification may be done randomly as in MacQueen's method 
in which the first g objects are taken, or it may be based 
upon some substantive grounds* In particular, if the in~ 
vestigator has some notion of where concentrations of objects 
occur in the structure, he may wish to ensure that clusters 
be given every chance to grow in that area* Thus, one 
modification to the Forgy procedure that may be used, if 
sufficient knowledge of the data structure is available, is 
that of specifying values to serve as initial seedpoints* 

One further observation, although both the Ward technique 
and Forgy technique attempt to minimize the within-group sum 
of squares, there is no guarantee that either technique 
will reach the absolute minimum* The only way to ensure 
the attainment of an absolute minimum sum of squares is through 
complete enumeration of the data set, but even aided by today's 
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advanced computer technology, a complete enumeration is 
unrealistic for all but very small data sets. 

The two clustering procedures chosen for the present 
study are used in successive stages. First, a hierarchical 
cluster analysis, Ward's objective function, is used. Second, 
results from Ward's method provide seed points for analysis 
of the same data by Forgy's non^hierarchical procedure. 
Computer programs for both of these procedures have been 
obtained and adopted for use on the AAMC's Institutional 
Profile System. The use of these procedures is illustrated 
in Chapter IV. The procedures are now available for application 
to subsets of data in the Institutional Profile database. 
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Chapter III 
DATA 

Beyond variations evident in the objectives defined 
by clustering methodologies, the properties assumed in any 
configuration of clusters is essentially a reflection of 
the data employed. This chapter focuses on the development 
of a data set suitable for use with cluster analysis* The 
two areas to be considered are, first, the availability and 
selection of variables from the Institutional Profile 
System and, second, the preparation of the data for analysis* 

The Institutional Profile System (IPS) is a computerized 
information retrieval system with a large data storage 
capacity and software to perform various statistical and data 
summary functions* Currently, the data base includes data on 
U.S. medical schools for 16 years and contains over five 
thousand variables from 49 source questionnaires* The sources 
of interest for the present study are primarily the Liaison 
Committee on Medical Education Questionnaire: Parts I and II, 
1973-74. 

A* The Pata Set: Availability and Selection 

The principle objective in developing a data set was to 
assemble a comprehensive set of variables with a sufficiently 
broad, yet detailed, perspective of medical education to 
facilitate exploratory analyses* In addition to this study, 
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two other exploratory studies under this contract also re- 
quired utilization of such a data set. (Sherman, 1975; Sedlacek, 
1975) . Basically, then, the variables in the data set are 
intended to have a general rather than contextual descriptive 
value . 

The contract specification related to this study, provides 
for the development of cluster analytic methods that may be 
used as an additional function within the IPS statistical 
package. Because, however, the cluster algorithms made 
available to the organization are not compatible with the pro- 
gram language used in IPS, all analyses conducted for this paper 
were done external to IPS. The first step, then, involved ex- 
traction of relevent data from IPS in order that these 
analyses might be performed. The studies conducted by Sherman 
and Sedlacek also required an external application of the 

data for use in the statistical programs available in SPSS * 
(Nie, 1975) . 



Variable selection involved the identification of the 
most current and meaningful data available. Selection began 
with the most current IPS data for 117 medical institutions. 
Although the bulk of the data was available for academic year 

Statistical Package for the Social Sciences 
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1973-74, the most current financial data was for 1972-73. 

In attempting to formulate a full spectrum of salient 
institutional descriptors, suggestions were elicited from 
AAMC staff representing a number of specific areas in aca- 
demic medicine. Other potentially useful descriptors 
were noted in the studies by Richar3s {1966), Rodgers, 
Otis, and RAND and were organised into logical domains 
given in Table 3.1. To facilifc&te comparison, variables 
used by each study are identified by an X in the columns 
to the right. The emphasis of each study on particular 
variable domains, as shown in this table, varies considerably. 
The most obvious omissions occur in the faculty domain, 
with the exception of RAND, and in the curriculum domain 
for all but Otis' study. 

The more current and extensive data available in IPS 
allow an expansion of the data set for the present study 
to 350 variables. The set contains approximately 220 
variables taken directly from source documents and an 
additional 130 variables derived from the original 220, 
(mostly ratios and percents) . The entire set is listed 
in Appendix A. A summary of the e* *nded domains is 
given in Table 3.2. 

B. Data Preparation 

The development of a data set, particularly one 
of this size, requires a number of preliminary tasks. 
These tasks include organisational considerations, euch 
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TABfrg 3.1 
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I 
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R 
D 
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0 

£ 
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S 



3M 



I. THE INSTITUTION 



A, ORGANISATION, PHYSICAL FACILITIES, SETTING AND GENERAL CHARACTERISTICS 
1« library volumes per student 

2. Ratio of number of beds in teaching hospital to no. of medical students 

3. % of total beds in university hospital 

4. private vs. public 

5. Age of Institution 

6. Growth rate 

7. Size of community in which located 

B, ADMISSIONS 

1. no. of applicants per place available 

2. % of male applicants 

3. average no. of applications per applicant 

4. accept transfer students 

5. % of out-of-state students 

6. Ratio of entering to applying students 

7. % of foreign students in entering class 

8. % of part-time and special students in student body 

9. % of entering students completing 4 years of college 

C, FINANCES 

1. Decie Federal Research funds 

2. Dollars from sponsored programs per student 

3. Decide total Federal sources of support 

4. Decl'e unrestricted endowment funds 

5. % of schools HEW contribution for science research 

6. % total expenditures for sponsored programs 

7. % of schools HEW contribution for all other non-science 

8. % of schools HEW contribution for science training 

9. t total expenditures for regular operating budget 

10. * of schools HEW contribution for non-science training 

11. % of schools HEW contribution from environmental health services 

12. % of schools HEW contribution from health services and mental health 
admi ni s tr a t ion 

13. % of total federal obligations. 

14. % of schools HEW contributions from NIH 

15. % of schools HEW contribution for science training 

16. Private funding sources 

17. Tuition cost 



X 
X 



X 
X 
X 
X 
X 



X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 



X 
X 
X 
X 



X 
X 
X 
X 



X 
X 
X 



X 
X 
X 
X 
X 



X 



X 
X 



II. THE FACULTY t 

A. COMPOSITION (FACULTY MIX) 

1. No. of full-time faculty 

2. Ratio of part-time faculty to full-time faculty 

3. Ratio of volunteer faculty to full-time faculty 

B. SALARY 



X 
X 



C, FINANCES 

1. Research funds per faculty member 

2. % of faculty salary from Federal dollars 

3. Sponsored program expenditures per full-time faculty 

D, PROFESSIONAL EMPHASIS 

1* faculty per student ratio 

2. % teaching responsibility for clinical fellows 



X 
X 



X 
X 
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III. THE STUDENT ; 














A. COMPOSITION (STUDENT MIX) 














1. Decide MCAT science score 


X 






X 






2. % of males in final year 


X 












3, % of first year students of student body 






X 








4. ratio of no. final year students to first year students 






X 








S. no. of graduates 






X 








6. total enrollment in post doctoral B,S. program 






X 








7. interns in major teaching hospitals 








X 






8. no. residents in major teaching hospitals 








X 






9. ratio of interns and residents to medical students 








X 






10. ratio of masters and doctorates in B«S. to medical students 








L X 






11. ratio of student equivalents to medical students 














12. % males in student body 




X 


X 








13. no. graduate degree candidates in B.S. 




X 










14. no. post doctorate fellows in B«S. and c.S. 




X 
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16. ratio of residents to medical students 




X 










B. STUDENT AID 














1. financial aid 




X 










C. FINANCES 














1. Total expenditures per student 


X 






X 






2. Dollars training support per student 


X 
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X 






4. Expenses for books and supplies for first year students 






X 








D. CURRICULUM AND PROGRAM 














1. No. of residency programs 


X 












2. No. of types of residency programs 


X 












3. No. of intern programs 


X 












4. Weeks of required clerkship 


X 












5. % of instruction devoted to B.S, requirements 


X 












6. % of instruction devoted to clerkship requirements 


X 












7. Year required clerkship introduced 


X 












8. Total weeks of instruction 


X 












9. No, of types of internship programs 


X 












10. elective emphasis 


X 












XX . is elective tune 


X 












12. all elective final year 


X 












E. ENROLLMENT 














1, Total size/enrollment 


X 


X 


X 


X 






2. Size of first year class 


X 












3. Size of final year class 


X 
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X 












5. Total students {affiliated) 








X 






F. OUTPUT CHARACTERISTICS 














1. Specialty Board Certification rate 


A 












2. Residency preference 


X 












3. Completion rate/attrition 




X 










4. Ratio of doctorates conferred to total enrollment 






X 
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Table 3.2 

I . INSTITUTION 

A. General Characteristics 

B. Demographic 

C. Library Facilities 

II. FINANCES 

A. Revenues 

B . Expenditures 

C. NIH Awards 

D. construction costs 

E . General 
III. ACADEMIC PROGRAM 

A. General 

B. Curriculum 

IV. FACULTY 

A. Staff 

B. Salary 

V. STUDENT ADMISSIONS 

A. .Enrollment 

B. Entering Qualifications 

C. Student Aid 

D. Expenses 

E. Student Selection 

F . career Review 



No. Variables * 

(22) 
14 
5 
3 

(86) 
37 
21 

6 
14 

8 

(39) 
11 
28 
(48) 
32 
16 
(164) 
69 
30 
40 
6 
14 
5 



Parentheses denote sub-totals per variable domain 
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as the formating, labeling, transformation, and storage 
of data* Additional steps entail the creation of new 
measures from existing variables and, finally, verifica- 
tion of the entire data set* 

Because of the operational benefits provided by the 
statistical programming package, SPSS, the IPS interface 
function was used to remove data from the system prior 
to undertaking these preliminary steps* On the basis of 
the 220 variables extracted from ISP, an additional 130 
derived measures were computed* Most of these measures 
represented the creation of percentages and ratios* 
The final preparatory step involved performing univariate 
frequency tabulations and summary statistics on the 
350 variables* These computations provided the basic 
documentation needed for verifying the substance of the 
data* Additionally, an analysis of the incidence of 
missing data for the entire selection of variables was 
performed* The results indicated that while the overall 
incidence of missing data was negligible, it did occur 
in high concentrations among ten percent of the institu" 
tional population* The findings in these data summaries 
were treated separately according to the needs of this 
study and the Sherman and Sedlacek studies* 
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Chapter IV 



APPLICATION 



In this chapter, the Ward and Forgy clustering methods 
will be applied to select variables extracted from the data- 
base described in Chapter III. The variables have been 
specifically chosen to closely approximate those used by 
the RAND Corporation (Keeler, et al, 1972) in a study designed 
to classify medical schools. This chapter, then, is an attempt 
to verify the RAND study. Such an effort is expected to shed 
light on three sources of concern: (1) to test the adequacy 
of the methods developed by AAMC and by RAND, (2) to test the 
adequacy of the data analyzed by AAMC and by RAND, and (3) to 
detect possible changes in medical education over time as 
reflected by the measures analyzed. 

A. The RAND Study 



In 1972, the RAND Corporation was coran^ssioned to conduct 
a broad study of the effects of federal programs on academic 
health centers. The project initially required a selection of 
ten medical schools that would be representative of all medical 
schools in the United States. To accomplish this task, RAND 
researchers selected these institutions by classifying medical 
schools into ten groups and choosing one school from each 
group for study. 




37 



34 



The RAND study utilized classif icatory methods similar 
to those presented in Chapter II. They selected 31 variables 
deemed broadly descriptive of medical education and obtained 
data for 94 medical schools. The first phase of their analysis 
involved a factor analysis of the 31 variables which, in turn, 
yielded six common factors . 

Factor scores were then computed for each of the six 
factors for each institution and submitted to non-hierarchical 
cluster analysis for which ten clusters were specified. The 
results of this analysis are presented in Table 4.1. 

To summarize, RAND conducted factor and cluster analyses, 
first, to isolate underlying dimensions existing in their 
selection of variables and, second, to identify distinct groups 
on which to base a representative selection of medical schools. 

B. Replication: Factor Analysis 

Replication of the RAND Study involves two distinct 
steps. The first step is undertaken in Sherman's study 
(1975) in which 2 3 variables comparable to RAND's 31 are sub- 
mitted to factor analysis. The second step, detailed in this 
chapter, is to cluster medical schools based on the six factors 
identified by Sherman. 

A list of variables used by Sherman and RAND are provided 
in Table 4.2. By utilizing the same set of procedures as did 

* Common factor analysis, followed by equimax rotation. 
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TABLE 4.1 



CLUSTER 1 (13 MEMBERS) 

Oregon 

Ohio State 

Colorado 

Kentucky 

LA, New Orleans 

Tennessee 

Minnesota 

Med College of GA,A 

Arkansas 

Kansas 

Texas, Southwestern 
SUNY, Buffalo 
Indiana 

CLUSTER 2 (5 MEMBERS) 

UC-Davis 
Michigan State 
LA, Shreveport 
UC- Irvine 
Mount Slnal 

CLUSTER 3 (11 MEMBERS) 

Med C of VA 
Maryland 

Med College of Wisconsin 

Northwestern 

Wayne State 

SUNY, Downs tate 

Hahnemann 

Thomas Jefferson 

Illinois 

Loma Linda 

U of Michigan 

CLUSTER 4 (10 MEMBERS) 

Case Western 
Columbia 

U of Pennsylvania 

NYU 

UCLA 

UCSF 

Harvard 

Yeshiva, Einstein 
U of Washington 
use 

CLUSTER 5 (13 MEMBERS) 

Oklahoma 
Puerto Rico 
Vermont 
sc 

U of VA 
Mississippi 

imc 

Louisville 
Missouri 
Nebraska 
West Virginia 
Xowa 

U of Wisconsin 



CLUSTER 6 (3 MEMBERS) 

Med College of Ohio 
UC-San Diego 
Arizona 

CLUSTER 7 (12 MEMBERS) 

Pittsburgh 
Cincinnati 
NJ Med School 
Temple 

SUNY, Upstate 
Bowman Gray 
Miami 

Florida, Gainsville 
Cornell 

Texas r Gal ve s ton 
Texas, San Antonio 
penn State 

CLUSTER S (10 MEMBERS) 

Yale 

Washington, St. Louis 
Emory 

Johns Hopkins 

Stanford 

Duke 

Vanderbilt 

Rochester 

Baylor 

U of Chicago 

CLUSTER 9 (13 MEMBERS) 

Tulane 
Georgetown 
Med C of PA 
Boston 

Loyola, Chicago 

Albany 

Saint Louis 

NY Med 

Chicago Med 

Tufts 

Howard 

George Washington 
Creighton 



CLUSTER 10 (4 MEMBERS) 

Utah 
Alabama 

U of Hew Mexico 
Meharry 
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TABLE 4.2 



Variables Used in Replicated Factor Analyses 



RAND (1972) 



AAMC 1 S REPLICATION (1975) 



1. Medical Students 

2. Interns in Major Teaching Hospitals 

3. Residents in Major Teaching Hospitals 

4. State or Private School Status 

5. Unrestricted Endowment (decile) 

6. MCAT Science Scores (decile) 

7. Percent Faculty Salary from Fed & (decile) 

8. State Medicaid Program 

9. Percent NIB Research Applications Approved 
10- Average Priority Score 

11. Population SMSA/Total Medical Students SMSA 

12. NIB Research and Training Grant $ (FY 1971) 

13. Total Students 

14. Percent of Medical Students from Home state 
15- Special Project $/Total students 

16. Log (1972 - year organized) 

17. Percent of Total Beds in University Hospital 

18. Percent of Total Beds in VA Hospital 

19. Part-time Faculty/Full-time Faculty 

20. Volunteer Faculty/Full-time Faculty 

21. Full-time Faculty/Total students 

22. Sponsored Program Expenditures/Full -time Faculty 

23. Regular Operating Expenditures /Total Students 

24. Total Expenditures/Total students 

25- Sponsored Program Expenditures/Total Expenditures 

26. (interns & Residents) /Medical Students 

27. (Masters & Doc. in Basic Science) /Medical students 

28. Financial Distress $ /Regular Operating Expenditures 

29. $ Weighted Priority score <- Priority Score 

30. $ Weighted Fraction Approved - Fraction Approved 

31. Other Student Equivalents/Medical students 



1. Medical Students (73-74) 

2. Total Interns Instrcuted by MC Faculty (72-73) 

3. Residents Instructed by MC Faculty (73-74) 

4. Public or Private Control (73-74) 

5. Tot MC Rev from Unrestricted Endowments (72-73) 

6. MCAT Science Scores of 1st Year Medical Student (73-74) 

7. Percent Sponsored Faculty Salary from Federal $ (72-73) 

8. SMSA Population per Medical Student (73*74) 

9. NIB Awards - Research Grants $ (73^74) 

10. Total of all Students instructed at MC (73-74) 

11. Percent of Medical Students from Home State (73-74) 

12. Special Project $ per MD Students (72-73) 

13. Age. Log (1974 - year organized) 

14. Part-time Faculty /Full-time Faculty 

15. Volunteer Faculty/Full-time Faculty 

16. Full-time Faculty /Total Students (73-74) 

17. Sponsored Program Expenditures /Full -time Faculty 

18. Regular Operating Costs per MD student (72-73) 

19. Total Expenditures/Total students (73-74) 

20. Sponsored Program Expenditures/Total Expenditures 

21. (Interns & Res id ents ) /Medical Students 73-74 

22. (Masters & Doc. in Basic Science)/Med students 

23. Medical Student Equivalents /Medical student (73-74) 
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RAND, that of common factor extraction and equimax rotation, 
Sherman finds essentially the same six factors based on 117 
institutions as opposed to RAND's 94. They are, as presented 
in Table 4.3: (1) graduate medical education programs, (2) 
Federal research involvement, (3) undergraduate medical 
education programs, (4) reliance on non-full-time faculty, 
(5) public versus private control and (6) non-M.D. educa- 
tion programs. 

A factor, which may be viewed as simply a synthetic 
variable, is a condensation of a group of variables into a 
single expression. Each school's position along the descrip- 
tive dimension represented by a "factor" (such as the second, 
"Federal research involvement") is given a "factor score" 
computed from the input variables using a formula . derived 
by the factor analysis. There are six such formulae, one for 
each factor. Each school, then, has six factor scores that 
replace values for the 23 vairables. The computed similarity 
of two schools is a composite measure of the similarity of their 
six factor scores. This composite measure includes a set 
of numerical weights to reflect the subjective importance of 
each of the six dimensions in determining the school's 
similarity. The present study uses the same numerical weights 
assigned by RAND. 
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TABLE 4 , 3 

Factor Pattern Matrix from Analysis of 
RAND f»tudy Variables (usintj New AAMC Data) 
By Method of Common Factors and Equimax Rotation 



RAND 



VARIABLES FACTOR VARIABLE LABELS . FACTOR (V? RIABLE GROUPS ) 





ISABELS 




1 


2 




A 
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5 




V6310 




TOT RfiSDNTS 1NSTR BY MD FAC 73-74 


.86 


,22 


,28 


-,03 


,07 


-,03 




Graduate 


















Mf*/I i ra 1 


TOT TM r pii*T?Mc: TUSTft BY MD TAG 72~73 


ftn 

, fiV 


, X 7 


21 

, ft X 
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* V ft 
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Education 
















V60S0 


Programs 


ENROL RATIO" INTERNS & RESDNTS TO MD STU 




-,10 


-,29 


,09 


,11 


,22 


V6010 




TOT STUDHHTS - - * ALL. , , INSTRUCTED AT MC 


, 61 


,38* 


,Tol 


-, 17 


-,12 


,21 


V3350 




SPONS PROG EXPD PER FT FAC 


, 10 


,86 


,19 


-,oo 


,11 


,09 


V3345 




MC EXFD-REG OP COSTS PER MD STUDENT 


, 24 


, 61 


-,19 


1 

,_5Q_i 


, 09 


, 17 




Federal 
















V2830 


Roaearch 


MC EXFD-PCT SPONS PROG EXPD OF TOT 


,04 


.67 


,25 


-,06 


, 34* 


, 12 




Involvement 
















V2940 




Will AWARDS RESRCH GRANTS $1000 73-74 


,42* 


,58 


,06 


, 35* 


. 30 


,36* 


V 1 ft uv 




MEAF MCAT SCORE SCI-1ST YR MD STUDENTS 


, 38* 


,41 


,00 


.09 


,23 
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V2820 




PCT SPONS FAC SALARY FROM FED $ 72-73 


,14 


-,32 


-.12 


,02 


-,20 
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V6020 
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ENR*_* L-TOT MD STUDENTS 73-74 


,30 
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,02 
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• 08 


,08 
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SMSA POP PER MD STUDENT 


.17 


,06 
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-,05 


-,27 


V5025 
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RATIO FT FAC TO TOTAL STUDENTS 


,11 
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-,29 


,76 


,05 


,04 




tton-Full-Ti 
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V2750 


Faculty 


TOT MC EXPD PER TOTAL STUDENTS 


.19 


,52] -,12 


,81 


,06 


-,07 


V5040 




RATIO VOL FAC TO FT FAC 


,05 


.02 


-,45* 


-,41 


-,16 


-,37 


V2740 




SPECIAL PROJ % PER MD STUDENT 72-73 


,20 


.11 


-,ll 


- , 34 


-,00 


.01 
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CONTROL TYPE (l=PRIVATE f 0~PUBLIC) 


,10 


,05 


.01 


, 07 


,78 


-,10 




Control* 
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PCT MD STUDENT FROM HOME STATE 


,11 
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.16 
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Private 
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-.20 



ERIC 



42 



39 

C. Replication: Cluster Analysis 

1. Analysis by Ward's Objective Function Method 

Based on the six factors identified by Sherman, two 
independent cluster analyses are conducted. The first 
cluster application, which is to be discussed in the 
immediate section, is the Ward objective function method. 
The application of the Porgy method follows in Section 2. 

Although Sherman's factor analysis replication employs 
117 medical institutions, 99 have been retained for cluster 
analysis. Data for the other 18 schools is missing for 
four (18%) or more of the 23 original vairables . Of the 99 
medical schools used in this study, 83 are in common with the 
94 used by RAND. 

The particular variation of Ward's technique used in 
this analysis (Wishart, 1969) requires a symetric matrix of 
euclidean distances* In other words, a distance is computed 
for each pair of schools to reflect a composite of their 
differences on the six factor scores. All such combinations 
are stored in a 99 x 99 matrix and used as input for analysis 
by the Ward method. 

The result of the Ward analysis is provided in the form 
of a dendrogram in Figure 4.1. The dendrogram is condensed 
in order to reflect stages of the merge process rather than 
the full 98 individual (n-1) merges. These steps are 
equally incremented to preserve the actual error sum of squares 

o '43 

ERIC 



40 



needed to accomplish each merge. The 99 institutions analyzed 
are in the left column of the figure and the sequence number 
for 25 merge stages is shown across the top. 

Figure 4.1 indicates that Arkansas, Louisville, and 
Louisiana-New Orleans merge at stage 1/ that Tennessee joins 
this cluster at stage 2, and that this cluster does not 
admit any new members until stage 6 when a cluster consisting 
of Mississippi, Oklahoma/ and Puerto Rico joins it. 

On a broad perspective, the dendrogram reveals an elongated 
pattern of cluster growth. Merges between schools appear to 
occur fairly uniformly throughout the procedure; however, during 
the earlier stages they branch laterally for some distance 
before expanding vertically to admit larger numbers of institu- 
tions. This trend indicates that group structure is, on the 
whole, relatively introspective and confined to small member- 
ships . 

To illustrate this point, note that clusters forming up 
to stage 5 are numerous but generally contain only two to 
four members. In fact, at level 5 there are 44 clusters, or an 
average of only 2.25 schools per cluster. The largest cluster 
contains six schools/ while a total of 17 schools have not as 
yet merged. If, for example, a single cluster is to be foisned 
from the first 18 institutions listed in the dendrogram, the 
criterion level is found to be "12. w The internal structure 
of this cluster in turn is made up of three subgroups of roughly 
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equal size. The criterion level has to be doubled, from 
level 6 to level 12, in order to tolerate the merging of 
these three subclusters. 

In an earlier example, Figure 2.2, a plot of the number 
of clusters existing at*-* any given stage against the critical 
distance has been suggested as a guide in selecting an 
optimal solution. It is perhaps more meaningful in the 
present context to plot the change in the sum of within- 
group deviations incurred at each merge. Additionally, 
to reduce the effects of local extremes, change in the sum 
of deviations is expressed as five element rolling averages. 
The resulting plot is given in Figure 4.2. 

A plot of this kind often reveals disproportionate jumps 
transpiring between group deviations and the progression of 
merges. Figure 4.2, for instance, reveals perhaps two 
transitional points in the curve. The first point is at 
the eighty-first merge where 18 clusters have been formed: 
the second, at the ninetieth merge with nine clusters. 
These two transitional points suggest two optimal solutions. 
Schools grouped in the 18 cluster and 9 cluster solutions 
are given in Appendix B. 

Ultimate determination of the "optimal" number of 
clusters is, of course, primarily subjective. Such deter- 
mination must take into account the nature of the data 
analyzed, the methods used, and the particular goals of the 
analysis. The purpose of using hierarchical clustering 
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is first, to survey the structure of the data set and second, 
to provide seedpoints for the non-hierarchical analysis. 
Since the overall objective is to replicate the RAND Study, 
a judgment as to an optimal solution is predicated upon 
RAND* s choice of ten clusters* Because there are 18 schools 
in the present study not considered in the RAND Study, it 
has been determined necessary to use a 16 cluster solution 
from the Ward analysis to generate seedpoints for the subse- 
quent non-hierarchical application* The cluster seedpoints 
and cluster sizes for the 16 cluster solutions are given in 
Table 4*4. 

2* Analysis by Forgy's K-Means Procedure 

In the preceding application of the Ward method, a 99 
x 99 similarity matrix has been used as input. In applying 
the Porgy method, however, the raw factor scores for each 
institution are employed* Again, these data consist of 
factor scores for six factors for each of 99 institutions. 

The within-clusters summed deviations for the 16 
cluster hierarchical solution is 109.15* The Porgy procedure, 
taking 5 iterations to converge, lowers this value to 97.61, 
A summary of the Porgy solution is given in Figure 4.3 . in 
which the mean scores on each of the six factors for these 
16 clusters ar^ depicted* The names of schools belonging to 
each cluster is also given* Note that the line MD is 
the overall mean based on all 99 institutions. 
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TABLE 4.4 
Assignment of Seedpoints* 









CLUSTER 
NUMBER 


NUMBER OF 
MEMBERS 






SEEDPOINT 


COORDINATES 




FACTOR 1 


FACTOR 2 


FACTOR 3 


FACTOR 4 


FACTOR 5 


FACTOR 6 


1 


18 


-.560 


-.420 


.520 


.100 


-.670 


-.120 


2 


11 


.150 


-.510 


.240 


-.110 


-.320 


.360 


3 


7 


.280 


.430 


-.400 


-.290 


-.530 


.200 


4 


3 


3.170 


.470 


.760 


-.180 


-1.040 


-.610 


5 


4 


.820 


.800 


1.080 


-.700 


-1.270 


.890 


6 


8 


.270 


-.660 


-1.670 


-.180 


-.580 


-.060 


7 


3 


.310 


2.030 


-1.590 


-.360 


-.690 


.340 


8 


2 


-2.020 


.590 


-1.300 


-.720 


.180 


-1.090 


9 


I 


4.660 


-3.340 


-3.780 


-.500 


.125 


2.030 


10 


4 


1. 960 


-.360 


-.110 


.190 


1.250 


. 010 


11 


2 


.080 


-.270 


-1.160 


-.810 


1.170 


.900 


12 


4 


.060 


.870 


-.320 


.440 


1.510 


.380 


13 


9 


-.490 


-.430 


.480 


.110 


.620 


-.350 


14 


10 


.290 


.070 


.180 


-.480 


.890 


-.070 


15 


3 


-1.370 


-.420 


-.840 


1.280 


-1.680 


-.570 


16 


10 


-.680 


1.200 


.610 


.900 


.280 


-.380 



* Seedpoint coordinates are derived by computing the mean score for each cluster on each of 
six factors. 
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FIGURE 4,3 

duster Membership And Profile Summary 
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FIGURE 4.3 (Con't ) 
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FIGURE 4.3 (COn't) 
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FIGURE 4.3 (Con't) 
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The first cluster contains 16 schools, all of which are 
public* As a group, these schools are somewhat below the 
mean in graduate program involvement and federal research 
involvement but generally have more undergraduate education 
programs* The profile of cluster 13 is virtually identical 
except with respect to control type* All the schools belonging 
to cluster 13 are private* 

The profiles of clusters 2 and 14 are also similar to 
each other on all dimensions but control type* While the 
bulk of cluster 2 contains public schools, cluster 14 is 
entirely made up of private schools* Although cluster 14 
is slightly closer to the roean on a n s i x factors, the pro- 
files on both indicate group structures characterized by 
higher emphasis on graduate and undergraduate education pro- 
grams and a relatively low emphasis on federal research 
involvement, non^full-time faculty, and nonHM*D* programs* 

Clusters 3 and 7 have similar attributes except they 
differ in intensity* Both clusters include public schools 
that have larger than average graduate and non-M*D* 
programs and generally smaller undergraduate programs with 
less reliance on full-time faculty* However, cluster 7 
has a much greater federal research commitment and a some- 
what smaller undergraduate educational program* 

Cluster 4 groups three public medical schools whose 
prominent feature is their large graduate educational pro- 
gram* Otherwise, these schools have a relatively high involve- 
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merit in both federal research and undergraduate education * 
yet are somewhat below the mean in their reliance on non-full- 
tim« faculty and in non-M*D* programs offered* The four 
medical schools in cluster 5 have much the same attributes 
with two exceptions* First, the graduate education program 
is smaller and, second, the non^M*D* programs offered by 
these schools is comparatively high* 

The differences between clusters 10 and 11 lie pri- 
marily in the intensity of scores on the six factors* First, 
the five schools in cluster 10 and two schools in 11 are 
all private* Both cluster*? have a lower than average commit- 
ment to federal research* Cluster 10 has a rather sizeable 
graduate education program with an average undergraduate 
education program and reliance on non-full-time faculty* 
The profile of cluster 11 on the three factors is, compari- 
tively speaking, somewhat lower* 

The profiles of the remaining five clusters are not 
directly comparable because their profiles approach varying 
extremes* Two of these clusters may well be considered 
outlying groups* Cluster 9, for instance, contains only 
Mayo* It can be readily seen from its profile that Mayo 
is quite unlike any other medical school in that it offers 
very large graduate and non-M*D* programs with an absence 
of involvement in federal research and undergraduate programs* 
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The two medical schools grouped in cluster 8, Nevada and 
Eastern Virginia, are new schools and as one would expect 
fall into lower extremes. In terms of research involve^ 
ment, however, cluster 8 is slightly above the mean. 

Cluster 15 is also made up of new medical schools. As 
a group, they differ from cluster 8 in their reliance on 
non-full-time faculty. 

Cluster 12 consists of a group of all private schools 
distinguishable by heavy research involvement and reliance 
on non- full-time faculty. In all other regards, this cluster 
exhibits average characteristics. 

A group of six public schools form Cluster 6. These 
schools, besides having relatively extensive graduate 
programs, are otherwise on the lower extreme of the spec^ 
trum of variables. 

Finally, the profile of cluster 16, which includes a 
mixture of public and private schools, indicates a lesser 
emphasis on graduate and non^M.D. educational programs, 
though the emphasis on undergraduate programs is substantial. 
Furthermore, the schools do have a significant involvement 
in federal research and reliance on non-full-time faculty. 

D. Comparison of Results 

A relatively simple way of comparing group membership 
between two sets of clustering results is to equate the 
number of matches and mismatches in tabular form, such as 
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shown in Table 4.5. In the left hand column may be found 
the cluster identification numbers assigned by RAND to its 
resulting clusters. The cluster identification numbers 
shown across the top of the table are those evolving from 
the Porgy application to the replicated factors* For ease 
of interpretation, columns have been rearranged so as to 
reflect results of the two studies in corresponding order. 
In this way, clusters that are most highly associated in 
terms of membership fall roughly into cells along the 
diagonal. 

Of those medical schools assigned by RAND to cluster 
1, for example, consistency with the replicated cluster 
findings is minimal. The fifteen members of RAND's cluster 
1 occur in small groups ranging across half the spectrum 
of the clusters derived from the replicated version. 
With few exceptions, much the same lack of conformity exists 
throughout. Notably, the exceptions are RAND 1 s cluster 5, 
and to a lesser extent, 8 and 9. Members belonging to clus- 
ters 8 and 9 account for the bulk of private schools and 
differ mainly in regard to relative wealth and research 
orientation. The characteristics of the membership of 
cluster 5 are less obvious. Generally, the medical schools 
belonging to this cluster are public institutions with rela- 
tively low federal research involvement and graduate program 
emphasis . 
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TABLE . 4.5 



AAMC Cluster Numbers 
As Assigned to Forgy Results 
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As stated earlier, the number of clusters formed in 
the current analysis has been increased beyond RAND 's ten to 
16 in order to provide for the effects of imposing a fixed 
number of clusters on differing populations. Specifically, 
the concern has been to allocate a sufficient number of 
clusters to enable medical schools not in the RAND Study to 
develop into groups external to RAND 's ten. As such, the 
several new medical schools forming the replicated clusters 
8, 9, 11 and 15 are completely absent from the RAND analysis. 
The schools belonging to these four clusters then are separated 
out, thereby making the respective solutions more readily 
comparable. As a further note, no apparent combination of 
the 12 remaining replicated clusters enhances the over all 
conformity with the RAND clusters without, in turn, obscuring 
other equally important group distinctions . 
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Chapter V 
Discussion and Conclusion 

The present paper reports on two aspects of an attempt 
to utilize rather sophisticated statistical procedures 
to shed light on the * structure" of medical education in 
the U.S. The first aspect is primarily methodological, 
and it involves the descriptions in Chapter II and Chapter 
III regarding multivariate cluster analysis procedures and 
manipulation of available data into formats amenable to 
analysis by such procedures. The second aspect is pri- 
marily substantive, and it involves the description in 
Chapter IV regarding the generation of a classification 
of medical schools based on the methods and data developed, 
and comparison of these results to early efforts. 

The discussion of the results reported, then, logically 
falls into two areas: methodological and substantive. 
The methodological aspect involves two components: choice 
of analysis procedures and constructing o£ a researchable 
data set. The present chapter is organized along these 
lines, with a final section devoted to conclusions that 
may be drawn from the report. 

A. Methodological Considerations 

1. Cluster Analysis Procedures 

The cluster analysis procedures chosen for the present 
work are relatively new techniques in any data analyst's 
toolbox. As such, not a great de^tl is known about differences 
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to be expected when one set of methods is used to contrast 
to some other set of methods* The lack of "maturity" of 
these methods contributes to the generally recognized 
concensus that use of such methods and interpretation of 
results is still more art than science , 

Statistically based clustering procedures first 
received wide scale attention from applied research 
methods scholars in the mid- and late 1960 's. A number 
of procedures predate this time frame, but these generally 
did not receive either wide scale attention nor use. With 
the wide availability of high speed electronic computers 
for scientific applications in the mid-60's* however, 
iterative approximation procedures became feasible , and 
as a result a wide number of statistically based cluster 
analysis procedures were suggested. Unfortunately but 
not atypically, at this stage of the game little is known 
about the effects of using particular methods in contrast 
to other methods. 

The state of the art of statistically based clus- 
tering procedures may be contrasted to the state of the 
art of factor analysis. Much of the conceptual work on 
factor analysis procedures was done in the 1935 to 1955 
time frame. In the 1960's, differences that could be 
expected in application of one procedure vs, another were 
delineated* and several widely acdepted "standard" procedures 
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were adopted for general use, at least as an initial step, 
by applied researchers. The state of the art in 1975 
of statistically based clustering procedures is not unlike 
the state of the art of factor analysis in the mid-1950 1 s, 
one where many procedures have been suggested in the litera- 
ture with little guidance available to the applied researcher 
as to which technique is best or at least most commonly used 
for various types of applications. 

A very legitimate question to ask regarding the 
results in Chapter IV, and in particular regarding the lack 
of comparability of the results from the present study to 
the results of the RAND study, is that of "How much did 
the use of differing cluster analytic methodologies, albeit 
from the same framework, contribute to the lack of compara- 
bility?" Given the state of the art of cluster analysis, 
the answer must be "We don't know." 

One hypothesis that has intuitive appeal and also 
a wide acceptance among scholars dealing with statisti- 
cally based clustering procedures is that data sets having 
clear and unambiguous clusters of objects will be resolved 
appropriately by most of the techniques suggested. Data 
sets having unclear or ambiguous cluster structure are 
likely to yield differing results upon application of 
differing procedures. Such .data sets are likely to yield 
widely variant results even upon application of very slight 
variants of one given procedure. Accepting this sort of 
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hypothesis, and overlooking differences in results 
potentially due to data and substantive factors discussed 
below, one conclusion that might be drawn from the present 
study is that medical schools, as described by a wide 
range of institutional variables, present a sufficiently 
unclear and ambiguous set of objects such to resist 
meaningful or consistent resolution into groups via statis- 
tically based clustering procedures* 

2* Construction of a Researchable Data Set 

There are two elements in this aspect of the study 
that require some discussion* The first aspect is the 
availability of data and the selection of available data* 
Despite the abundance of data elements available in tlK. 
IPS, there is little guidance as to which elements are 
most important or potentially most important in the present 
type of analysis* Important data elements may be missing 
from the available set; likewise, elements may be availa- 
ble but due to lack of knowledge, experience and/or 
previous research not selected for analysis* Continued 
efforts in the analysis of such data by procedures 
similar to the ones described in Chapter II are indicated 
as the only way to settle upon "key" data elements useful 
in any categorization of medical schools* 

Second, the data preparation aspects necessarily 
involve arbitrary decisions* Scaling of data elements 
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(i.e., standardization, transformations, etc . ) , and handling 
of missing data elements are of primary concern here. It 
is possible that further experience with the data and the 
methods will lead to variations in the data preparation 
procedures that substantively affect results. Again, only 
further efforts will shed light on the importance of 
these procedures. 
B. Substantive Considerations 

The results presented in Chapter IV indicate quite 
clearly that the classification of medical schools ob- 
tained by the RAND study was not replicated by the present 
study. The lack of replication may have been caused by 
a number of factors or combination of factors. Among 
potential explanatory factors are the methodological 
considerations discussed above, both procedural differences 
and data manipulative differences. The lack of compara- 
bility may also be due to one or more of a variety of 
substantive considerations. 

First, the data used by RAND was 1969-70 data, whereas 
the data used by the present study was 1973-74 data. It 
is possible that in this four year period, schools changed 
their profiles, as reflected by the data analyzed, suffi- 
ciently to drastically change meaningful grouping of 
schools, in other words, it is possible that the classi- 
fication structure presented by the RAND study was a "best" 
resolution for 1969-70 data and that the classification 
structure presented in the present paper is the "best" 
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resolution based on 1973-74 data. Under this possible 
explanation, the differences are due to real changes that 
took place between 1969-70 and 1973-74, and these changes 
are reflected in the differing classif icatory structures* 

Second, the "quality" of the data analysed by RAND 
and the "quality" of the data analyzed in the present 
study may be a factor in the lack of comparability of 
results. Much of the data analyzed by RAND was obtained 
from the AAMC * Over the past five years, the AAMC has 
conducted numerous activities to improve the quality and 
comparability of data collected from its constituency* 
It is felt that the quality, comparability, integrity 
and completeness of data collected has improved significantly 
during this time frame. Thus, the differences in the 
quality of the data analyzed by RAND and the quality of 
the data analyzed in the present effort may account at 
least in part for the lack of comparability of results* 

Third, the measurement level of the data analyzed 
was different for many variables for the two efforts* 
Some of the data supplied by AAMC to RAND was of a sensi- 
tive nature for each school? thus to protect confidentiality, 
the data transmitted to RAND was converted to decile 
scores* The data analysed by the present study was not 
subject to such a constraint* Such differences in 
measurement level for the data analyzed may again account 
at least in part for the lack of comparability of results* 



Fourth, the set of variables analyzed by the two 
studies was not a complete match. Only 23 of the 31 
variables analyzed by RAND were available, and even some of 
these were only approximations to the RAND variables. 
Even though the factor structure of the 23 variables 
seemed to be comparable to the factor structure of the 
31 variables analyzed by RAND, it is possible that the 
factor scores were substantively different, due to the lack 
of variable set match. 

Finally, the sets of schools analyzed by each study were 
not completely comparable. RAND analyzed data from 94 
schools; the present study analyzed data from 99 institu- 
tions; there were only 83 schools common to both analyses. 
The differences in the analysis samples may have contri- 
buted to alterations in the measurement space sufficient 
to cause at least in part the non-comparability of 
results . 

C . Conclusion 

A number of conclusions may be drawn from the work 
presented in this report. 

First, it should be concluded that classification 
methods based on statistically based cluster analysis 
methods have been developed and implemented for use on 
institutionally descriptive data stored in the AAMC's 
institutional Profile System. This conclusion is directly 
relevant to the tasks to be accomplished by AAMC under 
contract . 
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Second, it also should be concluded that data from 
the AAMC's Institutional Profile System were extracted, 
massaged, and analyzed via the clustering procedures. 
This conclusion is again relevant to fulfillment of the 
contract . 

Third, it may be concluded that the present study 
did not find evidence for the replicability of the results 
of the RAND study. In particular, the 10 clusters of 
medical institutions found by RAND were not found in the 
present study. A variety of factors were discussed that 
may have contributed to this conclusion, including a 
substantial number of both methodological and data differences 
between the studies. 

Finally, it may be concluded that, at least for the 
present, categorization of medical schools via procedures 
accounting for multiple measures simultaneously does not yield 
clear and unambiguous results. Such a conclusion must 
be drawn given the lack of comparability of the present 
results and the RAND results. 

This last conclusion should not be taken as a 
suggestion that such analyses are not useful; rather, it 
should be taken as an indication that the picture presented 
by data institutionally descriptive of U.S. medical schools 
is a highly complex one, one that despite the perceived 
need is not easily structured into a reasonably small 
number of groups of institutions. Now that such methods 
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have been developed and are available for future utili- 
zation, it is logical to obtain further results to either 
substantiate or reject the conclusion. Further results 
are also needed to test the hypothesis that clear, unambiguous 
categorizations may be made based on analysis of subsets 
of variables from intuitively or empirically related domains. 
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ABBREVIATIONS 



ACADM 

ADMISS 

ADMN & GEN 

ADV DEGREE 

ADVIS PROG 

AFFIL 

AM 

AMBUL 
AMT 

ANESTH 
APPL 

ASSOC PROF 
ASSOC PROF MD 
ASSTD 
AV 



ACADEMIC 
ADMISSIONS 

ADMINISTRATIVE & GENERAL 

ADVANCED DEGREE 

ADVISORY PROGRAM 

AFFILIATED 

AMERICAN 

AMBULATORY 

AMOUNT 

ANESTHESIOLOGY 

APPLICANT, APPLICATION 

ASSOCIATE PROFESSOR 

ASSOCIATE PROFESSOR OF MEDICINE 

ASSISTED 

AVERAGE 



EACH 
BAS SCI 

BEHAV OBSS PUBLSHD 
BLDG 



BACHELORS DEGREE 
BASIC SCIENCE 

BEHAVIORAL OBJECTIVES PUBLISHED 
BUILDING 



CL SCI 
CONSTR 
CURR 



CLINICAL SCIENCE 

CONSTRUCTION 

CURRICULUM 



DEPT 

DEV 

DOC 

DOC CAND 
DOC CONFRD 



DEPARTMENT 

DEVELOPMENT 

DOCTORATE 

DOCTORAL CANDIDATE 
DOCTORALS CONFERRED 



ED 

ENDOW 
ENTERING 
EQUIP 
EXPD 



EDUCATION 
ENDOWMENTS 
ENTERING STUDENTS 
EQUIPMENT 
EXPENDITURES 



FAC 
FED 
FMS 

FT FAC 



FACULTY 
FEDERAL 

FOREIGN MEDICAL STUDENTS 
FULL-TIME FACULTY 



GPA 

GRAB 

GRTS 



GRADE POINT AVERAGE 

GRADUATION 

GRANTS 
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HLTH 

HMO 
HOSPS 
HS SR 



HEALTH 

HEALTH MAINTENANCE ORGANIZATIONS 
HOSPITALS 

HIGH SCHOOL SENIOR 



INDUS 

INNOVATN 

INSTR 

INSTR & DEPT RESRCH 



INDUSTRY 
INNOVATION 

INSTRUCTOR, INSTRUCTED 

INSTRUCTION & DEPARTMENTAL RESEARCH 



LOC 

MANGMT 

MAS 

MC 

MCAT SCORE GEN 
MCAT SCORE SCI 
MCAT SCORE VER 
MCAT SCORE QUAN 
MD 



LOCAL 

MANAGEMENT 

MASTERS DEGREE 

MEDICAL COLLEGE 

MCAT SCORE GENERAL KNOWLEDGE 

MCAT SCORE SCIENCE 

MCAT SCORE VERBAL 

MCAT SCORE QUANTITATIVE 

MEDICAL 



NATL BDS 

NEED & RECVD AID 
NON-GO VT 



NATIONAL BOARDS 
NEEDED & RECEIVED AID 
NON-GOVERNMENT 



PCT 

PHYS ASST 
POP 

PRIM CARE 
PRIV 
PROP 
PROP MD 
PROG 
PROJ 
PROJTD 
PT PAC 



PERCENT 

PHYSICIAN'S ASSISTANT 
POPULATION 
PRIMARY CARE 
PRIVATE 

PULL PROFESSOR 

PROFESSOR OF MEDICINE 

PROGRAM 

PROJECT 

PROJECTED 

PART-TIME FACULTY 



RECVD 

REG OP' COSTS 
REQ AID 

REQ & RECVD AID 

RESDNTS 

RESRCH 

REV 

REV CAREER 



RECEIVED 

REGULAR OPERATING COSTS 

REQUESTED AID 

REQUESTED & RECEIVED AID 

RESIDENTS 

RESEARCH 

REVENUES 

REVIEW CAREER 
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SCH SCHOOL 

SELECTD SELECTED 

SERV SERVICE 

SMSA STANDARD METROPOLITAN STATISTICAL AREA 

SPONS SPONSORED 

ST STATE 

STUDENT EQUIV STUDENT EQUIVALENT 

TCH-TRN TEACHING- TRAINING 

TOT TOTAL 

TRANS STUDENTS TRANSFERRED STUDENTS 

TUIT & EXP EN TUITION & EXPENSES 

UN IV UNIVERSITY 

UNRESTR UNRESTRICTED 

VOL VOLUMES 

VOL FAC VOLUNTARY FACULTY 

WITHDRL WITHDRAWALS 

YR YEAR 
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APPENDIX A 



VARIABLES LIST FOE CLASSIFICATION 
OF MEDICAL INSTITUTIONS STUDY 
AND INTERRELATIONSHIPS STUDY 



INSTITUTION 



IPS SOURCE VARIABLES 



MATHEMATICAL 
TRANSFORMATIONS 



*** GENERAL CHARACTERISTICS*** 

ViOpO MC- IDENTIFICATION CODE 

VlOiO STATE MC LOCATED 

V1020 REGION MC LOCATED 

V1030 CONTROL TY?E (LOW=PUBLIC H I GH =PRI VATE ) 

V1040 YEAR F0UH3ED 

V1045 AGE OF INSTITUTION 

V1050 2 OR 4 YR SCK 

V1060 ACCREDITATION 

V1070 MC TYPE & HOSPITAL 

V1071 UtJIV AFFIL HOSPITAL 

V1072 UHIV OR ANY APr XL KOSPITAL 

V1O80 TOT BEDS AFFIL HOSPITAL 

V1085 RATIO AFFIL HOSP BEDS TO MD STUDENTS 

V1090 KUMBER OF DEANS APPNTD 60-74 

*** OEMO GRAPHIC *** 

VllOO MC LOCATION-SMSA POP 71 

V1110 MC LOCATION-IMMEDIATE LOCATION POP 71 

V1120 MC LOCATION IMMEDIATE LOCATION POP-DENSITY 71 

V1130 MC LOCATIGN-5MSA POP-PCT NON- WHITE 

V1140 SMS A POP PES MO STUDENT 

*** LIBRARY *** 

V1200 MC LIBRARIES-TOT VOL 

V1210 MC LIBRARIES* ACQUISITIONS 

Y*220 MC LIBRARIES-TOT SERIAL TITLES RECVD 



IPS 
IPS 
IPS 
IPS 
3064 

YR 1974-3064 

3066 

3065 

2847 

2847 

2847 

American Hospital Association, Curriculum Directory 
American Hospital Aasociation p Curriculum Directory 
Department of Institutional Development 



0366 
0367 
0368 
0369 

0366/1391 



2223 
2224 
2225 



FINANCES (ACADEMIC YR 72-73) 



*** REVENUES *** 



9 
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—TOTALS BY SOURCE — 

V2000 MC REV-TOT ALL SOURCES 

V2010 MC REV-TOT EED SOURCES 

V2017 PCT OF MC REV FROM FED SOURCES + INDIRECT COST 

RECOVERY 

—TOTALS EY SOURCE (UKRESTft)-- 

V2100 MC REV-TOT U^RESTR PRorrSSIGK.M, F£E5,tfD SERV PLAttS 

V2110 MC REV-TOT UBSS5TR EtfDOIV & GIFTS 

V2U5 PCT OF TOT MC REV FROM UHMSTH ENDOW £ GIFTS 



1120 
3129 

(1112 + 3129)/ 1120 



1118 

1093 

<1094+1098/1120) 



/1000 
/1000 

xioo 



/1000 
/1000 

XiOO 



V2120 MC REV-TOT UNRESTR STUDENT TUITION & FEES 

V2125 PCT OF TOT MC REV FROM UNRESTR STUDENT TUITION & FEES 

V2130 MC REV-TOT UNRESTR FED, ST, LOC SOURCES 

V2140 MC REV-TOT UNRESTR GIFTS BUSINESS & INDUS 

V2145 PCT OF TOT MC PEV FROM UNRESTR GIFTS BUSINESS & INDUS 

V2150 MC REV-TOT UNRESTR GIFTS FOUNDATION 

V2155 PCT OF TOT MC REV FROM UNRESTR GIFTS FOUNftYTIONS 

V2160 MC REV-TOT UNRESTR GIFTS ALUMNI 

V2165 PCT OF TOT REV FROM UNRESTR GIFTS ALUMNI 

V2170 tfC REV-TOT GIFTS 



-RECOVERY OF INDIRECT COSTS OF SPONS PROGS™ 



V2200 MC REV-TOT INDIRECT COSTS RECOVERY 

V2210 MC REV" I ND I RECT COSTS RECOVERY HON-GOVT 

V2220 MC REV- IU DIRECT COSTS RECOVERY FED PROG 



■SPONSORED TOTALS BY SOURCE — 

V2300 MC REV-TOT FED SPONS PROG 

V2310 MC REV-TOT iIULTI & SERV SPONS PROG 



SPONSORED RESEARCH BY SOURCE — 



V2400 MC REV-TOT SPONS RESRCH 

V2405 PCT OF TOT MC REV FOR SPONS RESRCH 

V2410 MC REV-TOT FED SPONS RESRCH 

V2415 PCT OF TOT SPONS RESRCH FROM FED 

V2420 MC REV-TOT ST, LOC SPONS RESRCH 

V2425 PCT OF TOT SPONS RESRCH PROM ST, LOC 

V2430 MC REV-TOT NON-GOVT SPONS RESRCH 

V2435 PCT OF TOT SPONS RESRCH FROM NON-GOVT 



SPONSORED TCH-TRN BY SOURCE — 



V2500 MC REV-TOT SPONS TCH-TRN 

V2505 PCT OF TOT MC REV FROM SPONS TCH-TRN 

V2510 MC REV-TOT FED SPQNS TCH-TRN 

V2515 PCT OF TOT SPONS TCH-TRN FROM FED 

V2520 MC REV-TOT ST, LOC SPONS TCH-TRN 

V2525 PCT OF TOT SPONS TCH-TRN FROM ST, LOC 

V2530 MC REV-TOT NON-GOVT SPONS TCH-TRN 

V2535 PCT OF TOT SPONS TCH-TRN FROM NON-GOVT 



*** EXPENDITURES *** 



TOTALS BY FUNCTIONAL CATEGORY (UNRESTR) — 



V2600 MC EXPD-TOT UNRESTR 

V2610 MC EXPD-TOT UNRESTR ADMN & GEN 

V2615 PCT OF TOT UNRESTR MC EXPD FOR ADMN & GEN 

V2620 MC EXPD-TOT UNRESTR ACADM SALARY, FEES TOT' ACTUAL 



1084 /1000 

(1086/1120-1102-110, -llll-l 1 14) xlOO 
1092/1000 

1096 /1000 

(1096/1098) xlOO 

1095 /1000 

(1095/1098) xlOO 

1094 /1000 

(1094/1099) xlOO 

1098 /I000 



1115 /1000 
1114 /1000 
1112 /1000 



3129 /1000 
1111 /1000 



1102 /1000 

(1102/1120) xlOO 

1099 /1000 
(1099/1102) xlOO 

1100 /1000 
(1100/1102) xlOO 

1101 /I000 
(1100+1101/1102) xlOO 



1107 /1000 

(1107/1120) xlOO 

1104 /100O 
(1104/1107) xlOO 

1105 /1000 
(1105/1107) xlOO 

1106 /1000 
(1105+1106/1107) xlOO 



1137 /1000 

1136 /1000 

(1136/1137) xlOO 

1251 /1000 



V2625 PCT OP TOT UNRESTR MC EXPD FCk ACADM SALARY* FEES 

V2630 MC EXPD-TOT UNRESTR IfcSSTR £ DEPT RESRCH 

V26 35 PCT OF TOT UNRESTR MC EXPD FOR INSTR 6 DEPT RESRCK 

V2640 MC EXPD-TOT UNRESTR PUBLIC SERV 

— EXPENDITURES PER STUDENT 6 STAFF — 

V2700 INSTR 6 DEPT RESRCH EXPD PER STUDENT 

V2710 INSTR £ DEPT RESRCH EXPD PER FAC 

V2720 MC EXPD-TOT UNRESTR PER MD STUDENT 

V2730 MC EXPD-TOT. UNRESTR PER FT FAC 

V2740 SPECIAL PROJ $ PER MD STUDENT 72-73 

V2750 TOT MC EXPD PER TOTAL STUDENTS 

"SPONSORED EXPENDITURES — 

V2800 MC EXPD-TOT SPONS RESRCH 

V2805 PCT OF TOT MC 2XPD FOR SPONS RESRCH 

V2810 MC EXPD-TOT SPONS TCH-TRU 

V2815 PCT OF TOT MC EXPD FOR SPONS TCH-TRN 

V2820 PCT SPONS FAC SALARY FROM FED S 72-73 

V2830 MC EXPD-PCT SPONS PROG EXPD OF TOT 

V2840 MC EXPD-TOT SPONS PROGS — ALL TYPES 

*** NIH AWARDS *** 

V2900 NIH AWARDS-PROG*PROJ £ CENTER GRTS $1000 

V2910 NIH AWARDS- RESRCH GRTS $1000 67-68 

V2920 NIH AWAROS-RESRCH GRTS $1000 68-69 

V2930 NIH AWAHDS-RESRCH GRTS $1000 72-73 

V2940 NIH AWARDS-RES RCH GRTS $1000 73-74 

V2950 NIH AWABDS PCT CHANGE 

V2951 NIH RESRCH $ PCT CHANGE 

*** CONTRUCTION*** 
FUNDS BY SOURCE— 

V3000 CONSTR FUNDS -TOT FED 

V3005 PCT OF TOT CONSTR FUNDS FROM FED 

V3010 Q0N5T R FUNDS-TOT ST 

V3015 PCT OP TOT CONSTR FUNDS FROM ST 

V3020 CONSTR FUKDS-TOT PRIV GIFTS 

V3025 PCT OF TOT CONSTR FUNDS FROM PRIV GIFTS 

V3030 CONSTR FUNDS-TOT OTHER 

V3035 PCT OF TOT CONSTR FUNDS FROM OTHER 

— BUILDING COSTS — 



V3100 BLDG CONSTR COSTS-TOT 

V3110 MOVABLE EQUIP CONSTR COSTS -TOT 



(1251/1137) 
1124 

(1124/1137) 
1130 



xlOO 
/1000 
xlOO , 



1126/(1257+0551+0550+3130+3137+1559) 

1126/3127 

1137/1257 

1137/3127 

1205/1257 

1137/(1391+1559+3130+1549+1548) 



1126 

(1126/1137) 
1128 

(1128/1137) 
(1162/1168) 
(1159/1137) 
1159 



/1000 

xlQ0 

/1000 

xlOO 

xlOO 

xlOO 

/1000 



1120 
2249 
2250 
2254 
2255 



/1000 
/1000 
/1000 
/1000 
/1000 



(2250-2249/2249)+ (2254-2250/2250) +(2255- 

2254/2254/3) xlOO 
(2254+22 55)- (2249+2250) /< 2249+2250) xlOO 



1937 

(1937/1935) 
1938 

(1938/1935) 
1939 

(1939/1935) 
1940 

(1940/1935) 



/1000 

xlOO 

/1000 

xlOO 

/1000 

xlOO 

/1000 

xlOO 



1935 
1936 



/1000 
/1000 
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— BUILDING USE — 



V3200 CONSTR BLDG USE-PCT FOR TCH 

V3210 CONSTR BLDG USE-PCT FOR RESRCH 

V3220 CONSTR BLDG USE-PCT FOR MD SERV 

V3230 CONSTR BLDG USE-PCT FOR OTHER 



*** GENERAL *** 

V3300 PROFESSIONAL FEES RECVD PER CL SCI FAC 

V3 310 MC LIBRARIES -BUDGET , BOOKS , PERIODICALS , BINDING 

V3320 MC EXPEN-SPOHS RESRCH PER FT FAC 

V3325 MC EXPEIJ-SPOKS RESRCfi PER MD STUDENT 

V3330 MC EXP EN- S POMS TCH-TRN PER MD STUDENT 

V3340 MC EXPEN-REG OP COSTS 

b Y3?46 f MC BEV-TQT PER MD STUDENT 

V3350 SPONS PRQG EXPD PER FT FAC 



ACADEMIC PROGRAM 



*** GENERAL *** 



-3 

<5 



V4000 OFFER COMBINED DOC+MD PROG 74-75 

V4010 USE NATL BDS PT 1-PROMOTION TEST 74-75 

V4020 USE NATL BDS PT 2 - GRADUAT ION TEST 74-75 

V4030 MINIMUM MONTHS INSTR FOR MD DEGREE 

V4035 UNIT FOR RESRCH * DEV OP ED PROCESS 

V4O40 MC PERMITS PASS-FAIL GRADING 

V4050 TYPE GRADING-HONORS, PASS * FAIL 74-75 

V4060 ELTH PRACTITIONER PROG-PEYS AS ST 73 

V4070 ELTH PRACTITIONER PROG-NURSING 73 

V4080 HLTH PRACTITIONER PROG-MEDEX 73 

V4090 HLTH PRACTITIONER PROG-MIDWIFE EsTJRSE 73 



*** CURRICULUM *** 



V4100 CURR INNOVATN-AMBUL PRIM CARE PROG 74*75 

V4110 CURS INNOVATN- SPECLT Y TRACKS 74*75 

V4120 CURR INNOVATN-CL APPL COMPUTERS 74*75 

V4130 CURR INKOVATN-COMPOTER ASSTD INSTR 74-75 

V4140 CORR ELECT I VE S^HUMAN SEXUALITY 74-75 

V4150 CURR ELECTIVE S-MD JURISPRUDENCE 74-75 

V4160 CURR ELECTIVE S -NUT R IT ION 74-75 

V4170 CURR ELECT IVE S-NQN- WE STE RN MEDICINE74-75 

V4180 CURR ELECTIVE S -POP DYNAMICS 74-75 

V4190 CURS ELECTIVES-DRUG ABUSE 74-75 

V4200 CURR ELECTIVES-ALCOHOLISM 74*75 

V4210 CURR ELECTIVE S-MD HYPNOSIS 74-75 

V4220 CURR ELECTIVE S-ETHICAL PROBLEMS 74-75 

V4230 CURR ELECTIVE S -HLTH CARE DELIVERY 74-75 

V4240 CU RR- FAMILY MD PROG 74-75 



9 

:rlc 



1941 /1000 

1942 /1000 
; 1943 /1000 

* 1944 /1000 

* (1118/1030) xlOO 



2218 

1126/3127 
1126/1257 
1128/1257 



\ ii2o/i3n 

1159/3132 



1321 
1359 
1362 
2059 
1378 
1352 
1353 
0387 
0388 
0389 
0390 



1350 
1351 
1343 
1344 
1332 
1333 
1334 
1335 
1336 
1337 
1338 
1339 
1340 
1341 
2066 
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V4250 CURR- FAMILY MD GRAD PROG 73 

V4260 CURR- PRIMARY CARE PHOG 74-75 

V4270 CURR-ACCELERTD PROG-MD DEGREE LESS THAN 6 YRS 

V4280 CttRR-RESRCH & DEV OP ED PROCESS 74-75 

V4290 CURR-REQUIftED AMBUL CARE EXPERIENCE 73 

V4300 CURR-PCT UNDERGRAD EXPERIENCE AMBUL CARE 73 

V4310 CURR-PRIM CARE DEPT ENCOURAGE GENERALIST 73 

V4320 CURR-TOT MD STUDENTS OPERATIONAL HMO 73 

V4325 CURR-HLTH PRACTITIONER PROG 73 

V4330 CURR- EMERGENCY CARE PROG 73 

V4340 CURR-PATIENT CARE PROG" ALCOHOLISM OR DRUG ABUSE73 

V4350 CURR-HLTH CARE KAEGMT PROG 73 

V4360 STATEMNT OF BEHAV OBJS PUBLSHD 



FACULTY 



*** STAFF *** 

— TOTAL TEACHING STAFF — 

V5000 FT F AC-TOT ALL DEPT 72-73 

V5010 FT F AC-TOT ALL DEPT 73-74 

^ V5020 FATIO-FT FAC TO MD STUDENTS 

*^ V5025 RATIO FT FAC TO TOTAL STUDENTS 

00 V5030 RATIO PT FAC TO FT FAC 

V5040 RATIO VOL FAC TO FT FAC 

— TOTALS BY MAJOR DISCIPLINE — 

V5100 BAS SCI -TOT FT FAC 

V5110 BAS SCI-TOT PT FAC 

V5120 BAS SCI-TOT VOL FAC 

V5130 CL SCI-TOT FT FAC 72-73 

V5140 CL SCI-TOT FT FAC 73-74 

V5150 CL SCI-TOT PT FAC 

V5160 CL SCI-TOT VOL FAC 

V5170 RATIO-BAS SCI FAC TO CLIN SCI FAC 
— TOTALS BY RANK — 

V5200 PROF- TOT FT-CLI SCI 

V5205 PROF-PCT FT-CLI SCI 

V5210 ASSOC PROF-TOT FT-CLI SCI 

V5215 ASSOC PROF-PCT FT-CLI SCI 

V52 20 ASST PROF -TOT FT-CLI SCI 

V5225 ASST PROF-PCT FT-CLI SCI 

V5230 INSTR-TOT FT-CLI SCI 

VS235 INSTR-PCT FT-CLI SCI 

V5240 PROF-TOT FT"3AS SCI 

V5245 PROF-PCT FT- BAS SCI 

V5250 ASSOC PROF-TOT FT" BAS SCI 

V5255 ASSOC PROF-PCT FT" BAS SCI 



ERJC 



0403 
2071 
1310 
1378 
0370 
0372 
0375 
0381 

0416 
0420 
0424 
1374 



3127 
3132 

1391/3132 

3132/1391+1559+3130+1549+1548 

(1734+1786)/3132 

(176a+1804)/3132 



1662 
1734 
17 68 
1030 
1680 
1786 
1804 

1662/1680 



1680 

(1680/1752) xlOO 
1698 

(1698/1752) xlOO 
1716 

(1716/1752) xlOO 
1734 

(1734/1752) xlOO 
1630 

(1630/1662) x ioo 
1638 

(1638/1662) xlOO 
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V5260 ASST PROP-TOT FT-BAS SCI 

V5265 ASST PROF-PCT PT-BAS SCI 

V5270 INSTR-TOT PT-BAS SCI 

V5275 INSTR-PCT FT-BAS SCI 



— VACANCIES — 



V5300 VACANCIES-FT PAC-CL SCI 

V5310 VACANCIES-FT FAC-BAS SCI 

V5320 PCT BUDGETED VACANCIES-CL SCI 

*** SALARY *** 



—BASIC SCIENCE BY RANK — 



V5400 AV TOT SALARY-PRGF-BAS SCI 74-75 

V5410 AV TOT SALARY-ASSOC PRGF-BAS SCI 74-75 

V5420 AV TOT SALAR¥-ASST PROF-BAS SCI 74-75 

V5430 AV TOT SALARY- INS TR-BAS SCI 74-75 



— CLINICAL SCIENCE BY RANK— 



V5500 AV TOT SALARY-PROF -CL SCI 74-75 

V5510 AV TOT SALARY-ASSOC PROF-CL SCI 74-75 

V5520 AV TOT SALARY- AS ST PROF-CL SCI 74-75 

V5530 AV TOT SALARY- INS TR-CL SCI 74-75 

tO — DEPARTMENT OF MEDICINE BY RANK— - 

V5540 AV TOT SALARY- PROF MD-CL SCI 74*75 

V5550 AV TOT SALARY-ASSOC PROF MD-CL SCI 74-75 

VS560 AV TOT SALARY-ASST PROF MD-CL SCI 74*75 

VS570 AV TOT SALARY* INSTR MD-CL SCI 74-75 



—ANESTHESIOLOGY BV BANK — 



VS600 AV TOT SALARY- PROF- AN ESTH 74-75 

VS610 AV TOT SALARY- AS SOC PROF-ANESTH 74-75 

V5620 AV TOT SALARY-ASST PROF— ANES TH 74-75 

V5630 AV TOT SALARY- INSTR- ANES TH 74-75 



STUDENT ADMISSIONS 



*** ENROLLMENT *** 



— STUDENT BODY TOTALS — 



ERLC 



V6000 EKROLL-TOT STUDENTS 

V6010 TOT STUDENTS « * « ALL. . « INSTRUCTED AT MC 

V6020 ENROLL-TOT MD STUDENTS 73-74 



1646 

(1646/1^62) xlOO 
1654 

(1654/1662) xlOO 



1934 
1844 

1934/(1934+1752) 



3579 
3530 
3531 
3532 



3584 
3586 
3537 
3588 



3640 
-S641 
3642 
3643 



3620 
3621 
3622 
3623 



1391+154 3+1549+3138+3130 
1391+1559+3110+1549+1548 
1391 



V6025 ENROLL -TOT MD STUDENTS 72-73 

V6030 . ENROLL-ACTUAL GROWTH RATE 

V6040 ENROLL-TOT MD STUDENT EQUIV INSTR BY MD 

V6050 ENROLL RATIO -MD STUDENTS EQUIV TO MD STUDENTS 

V6080 ENROLL RATIO-INTERNS £ RESDNTS TO MD STUDENTS 

V6090 ENROLL RATIO-INTERNS TO MD STUDENTS 

V6100 ENROLL RATIO-RESDNTS TO MD STUDENTS 

V6110 ENROLL-TOT FINAL Y£ STUDENTS-MAS & DOC CAND-BAS SCI 

V6120 ENROLL-TOT FINAL YR STUDENTS-MAS & DOC CONFRD 

V6130 ENROLL-TOT FINAL YR STUDENTS -NON -DEGREE CAND 

V6140 ENROLL RATIO-MAS & DOC BAS SCI TO MD STUDENTS 

V6160 ENROLL RATIO-MAS & DOC CONFRD TO TOT ENROLL 

■IN STATE-OUT OF STATE STUDENTS — 

V6200 ENROLL-TOT IN ST MD STUDENTS 

V6210 ENROLL-TOT OUT ST MD STUDENTS 

V6220 ENROLL RATIO-Itf ST TO OUT ST MD STUDENTS 

V6230 PCT MD STUDENT FROM HOME STATE 

-STUDENTS PER FACULTY — 

V6300 TOT RESDNTS INSTR BY MD FAC 72-73 

V6310 TOT RESDNTS INSTR BY MD FAC 73-74 

V6320 TOT INTERNS INSTR BY MD FAC 72-73 

V6330 TOT INTERNS INSTR BY MD FAC 73-74 

-PROJECTED ENROLLMENT— 

V6400 PROJTD ENROLL-TOT FINAL YR MD STUDENTS 74-75 

V6410 PROJTD ENROLL-TOT FINAL YR MD STUDENTS 75-76 

V6420 PROJTD ENROLL-TOT FINAL YR MD STUDENTS 76-77 

V6430 PROJTD ENROLL* PCT GROWTH MD STUDENTS 74-77 

V6440 PROJTD ENROLL-TOT 1ST YR MD STUDENTS 74-75 

V6450 PROJTD ENROLL-TOT 1ST YR MD STUDENTS 75-76 

V6450 PROJTD ENROLL -TOT 1ST YR MD STUDENTS 76-77 

V6470 PROJTD ENROLL-TOT 1ST YR MD STUDENTS 77-78 

V6480 PROJTD ENROLL-TOT 1ST YR MD STUDENTS 78-79 

V6491 PROJTD ANNUAL GROWTH RATE 74*78 

■BY CLASS — 

V6500 ENROLL-TOT 1ST YR MD STUDENTS 

V65I0 ENROLL-TOT MID YR MD STUDENTS 

V6520 ENROLL-TOT FINAL YR MD STUDENTS 

■BY SEX — 

V6600 MaOLL-^CT MALE 1ST YR MD STUDENT 

V6605 SNROTX-VCT FFMALE 1ST YR MD STUDENT 

V6610 ^ROLL-TOT :IALS MID YR MD STUDENT 

V6615 ENROLL- PCT VETIALE HID YR MD STUDENT 



1257 

(1391-1257) /1257 
1559 

1559/1391 

J1549+1548)/1391 

1549/1391 

1548/1391 

3130 

3131 

3137 * 

3130/1391 

3131/1391+J 548+1549+3138+3130 



1970 
1971 

1970/1971 
1970/1391 



0551 
1549 
0550 
1548 



1620 
1621 
1622 

1610 
1611 
1612 
1613 

1614 _ 
(1614/1610) "^-1) 



1382 
1388 
1385 



1380 

((1382-1380)/1380) 
1386 

{(1388-1386)/1386) 
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V6620 ENROLL -TOT MALE FINAL YR MD STUDENT 

V6625 ENROLL-PCT FEMALE FINAL YR MD STUDENT 

V6630 ENROLL -TOT MALE MD STUDENT 

V6635 ENROLL-PCT FEMALE MD STUDENT 



— FOREIGN MEDICAL STUDENTS — 



V6700 FMS ENROLL -TOT MD STUDENTS 

V6705 FMS ENROLL-PCT MD STUDENTS 

V6710 FMS ENROLL ~TOT 1ST YR MD STUDENTS 

V6715 FMS ENROLL-PCT 1ST YR MD STUDENTS 

V6720 FMS EN ROLL -TOT MID YR MD STUDENTS 

V6725 FMS ENROLL-PCT MID YE MD STUDENTS 

V6730 FMS ENROLL -TOT GRAD MD STUDENTS 

V6735 FMS ENROLL-PCT GRAD MD STUDENTS 



— ETHBIC COMPOSITION — 

V6800 md Students -tot uwder rep minority 

V6805 m STUDENTS-PCT under rep minority 

V6810 MO STODEHTS-TOT CAUCASIAN MALE 

V6820 MD STUDENTS-TOT CAUCASIAN FEMALE 

V6830 MD STUDENTS -TOT ORIENTAL- AM MALE 

V6840 MD STUDENTS -TOT ORIENTAL-AM FEMALE 



"REPEATERS- - 

00 V6900 REPEATERS -PCT 1ST YR MD STODEHTS 

V6910 REPEATERS-TOT 1ST YR MD STUDENTS MALE 

V6920 REPEATERS-TOT 1ST YR MD STUDENTS FEMALE 



"WITHDRAWALS — 



V7000 WITHDRL-TOT MD STODESTS-ALL REASONS 

V7005 WITKDRL-PCT MD STODENTS-ALL REASONS 

V7010 WITHDRL-TOT 1ST YR*-ALL REASONS 

V7015 WITHDRL-PCT 1ST YR-ALL REASONS 

V7020 WITKDRL-TCT MID YR-ALL REASONS 

V7025 WITHDRL-PCT MID YR-ALL REASONS 

V7030 WITHDRL-TOT FINAL YR-ALL REASONS 

V7035 WITHDRL-PCT FINAL YR-ALL REASONS 

*** ENTERING QUALIFICATIONS *** 

— GPA — 



V7100 UMDERGRAD GPA-ENTERIKG 1ST YR MD STUDENTS 

V7110 PRE MD GPA 3-6 TO 4.0-1ST YR MD STUDENTS 

V711S PRE MD GPA 3-6 TO 4 . G-PCT 1ST YR MD STUDENTS 



ERJC 



1383 

( (1385-1383) /1383) xIOG 
1389 

( (1391-1389) /1389) xlGO 



1394+1396+1395 

* (1394+1396+1395/1391) xlGG 
1394 

(1394/1382) xlGG 
1396 

> (1396/1388) xIOO 
1395 

(1395/1385) xIOO 



1461 

( (1435+1436/1391) xIOO 

1419 
, 1420 

1435 
.1436 



((1490+149D/1382) xIOO 

1490 

1491 



i 1529 

(1529/1391) xIOO 
1526 

(1526/1382) xIOO 
1528 

(1529/1371) xIOO 
1527 

(1529/1385) xIOO 



1547 
1530 

(1530/1382) xIOO 



V7X20 PRE MD GPA 2 > 6 TO 3.5-1ST YR MD STUDENTS 

V7X25 PRE MD GPA 2.6 TO 3.5 PCT 1ST YR MD STUDENTS 

V7130 PRE MD GPA LESS THAN 2.6-1ST YR MD STUDENTS 

V7135 PRE MD GPA LESS THAN 2-6-PCT 1ST MD STUDENTS 

V7140 PRE MD GPA UNKNOWN^lST YR MD STUDENTS 

V7145 PRE MD GPA UNKNOWN-PCT 1ST YR MD STUDENTS 

-MCAT — 

v7200 mean mcat score sci-1st yr md students 

v7210 mean mcat score ver-1st yr md students 

v7220 mean mcat score gem- 1st yr md students 

v7230 mean mcat score quan-1st md students 

'degree status-* 

v7300 tot bach-1st yr md students 

v7305 pct bach-1st yr md students 

v7310 tot mas-1st yr md students 

v7315 pct mas-1st yr md students 

v7320 tot doc-1st yr md students 

v7325 pct doc-1st yr md students 

v7330 pct any 0egree-15t yr md students 

v7340 tot other degree- 1st yr md students 

v7345 pct other degree- 1st yr md students 

v7350 tot no degree- 1st yr md students 

v7355 pct no degree "1st yr md students 

■undergraduate education" 

v7400 under grad eo*2 yes or less-1st yr md students 

v7405 under grao ed-2 yrs or less-pct 1st yr md students 

v7410 undergrad ed*3 yrs-1st yr md students 

v7415 undergrad eo~3 yrs-pct 1st yr md students 

v7420 undergrad ed-4 yrs or more-1st yr md students 

v7425 undergrad ed-4 yrs or more- pct 1st yr md students 

*** student aid *** 

■requesting — 

v7500 req aid-tot md students 

v7505 req+recvd ald-pct md students 

v7510 req aid-tot 1st yr md students 

v7515 req+recvd aid-pct 1st yr md students 

v7520 req aid-tot 2nd yr md students 

v7525 req+recvd aid-pct 2mh yr md students 

v7530 req aid-tot 3rd yr md students 



1531 

(1531/1382) xlOO 
1532 

(1532/1382) xlOO 
1533 

(1533/1382) xlOO 



1546 
1543 
: 1544 
1 1545 



7537 

(1537/1382) xlOO 
1538 

(1538/1382) xlOO 
1539 

(1539/1382) xlOO 
{ (1538+1539+1540)/ (1537+1541) ) xlOO 

1540 

(1540/1382) xlOO. 
1541 

(1541/1382) xlOO 



1534 

(1534/1382) xlOO 
1535 

(1535/1382) xlOO 
1536 

(1536/1382) xlOO 



1979 

(1989/1979) xlOO 
J.975 

^(1985/1975) xlOO 
1976 

(1986/1976) xlOO 
1977 



9 
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V7535 REQ+RECVD AID-PCT 3RD YR MD STUDENTS 

V7540 REQ AID-TOT FINAL YR MD STUDENTS 

V754 5 REO+RECVD AID-PCT FINAL YR MD STUDENTS 

RECEIVING— 

V7600 RECVD AID-TOT MD STUDENTS 

V7610 TOT AID TO MD STUDENTS 

V7615 AV AMT AID TO MD STUDENTS 

V7620 RECVD AID-TOT 1ST YR MD STUDENTS 

V7630 TOT AID TO 1ST YR MD STUDENTS 

V7635 AV AMT AID TO 1ST YR MD STUDENTS 

V7640 RECVD AID-TOT 2ND YR MD STUDENTS 

P7650 TOT AID TO 2ND YR MD STUDENTS 

V7655 AV AMT AID TO 2ND ^"R MD STUDENTS 

V7660 RECVD AID-TOT 3RD YR MD STUDENTS 

V7670 TOT AID TO 3RD YR MD STUDENTS 

V7675 AV AMT AID TO 3RD YR MD STUDENTS 

V7680 RECVD AID-TOT FINAL YR MD STUDENTS 

V7690 TOT AID TO FINAL YR MD STUDENTS 

V7695 AV AMT AID TO FINAL YR MD STUDENTS 

NEEDING — 

V7700 NEED AID-TOT MD STUDENTS 

V7705 NEED+RECVD AID-PCT OF TOT MD STUDENTS 

V7710 NEED AID-TOT 1ST YR MD STUDENTS 

V7715 NEED+RECVD AID-PCT 1ST YR MD STUDENTS 

V7720 NEED AID-TOT 2ND YR MD STUDENTS 

V7725 NEED+RECVD AID-PCT 2ND YR MD STUDENTS 

V7730 NEED AID-TOT 3RD YR MD STUDENTS 

V7735 NEED+RECVD AID-PCT 3RD YR MD STUDENTS 

V7740 NEED AID-TOT FINAL YR MD STUDENTS 

V7745 NEED+RECVD AID-PCT FINAL YR MD STUDENTS 

AID DISPERSED TO STUDENTS — 

V7&0G AID-AMT PER MD STUDENT 

V7&10 RECVD AID-LOANS-TOT MD STUDENTS 

V7815 RECVD AID-LOANS-PCT MD STUDENTS 

V7820 RECVD AID-SCHLSHIP-TOT MD STUDENTS 

V7&25 RECVD AID-SCHLSHIp-pCT MD STUDENTS 

*** EXPENSES *** 

TUITION, EXPENSES r & FEES-- 

V7900 TUIT4EXPEN PER IN ST MD STUDENT 

V7910 TUITtEXPEN PER OUT ST MD STUDENT , 

V7920 FEES+EXPEN EXCLUD TUIT PER MD STUDENT 

V79 30 AV EXP EN PER IN ST MD STUDENT UNMARRIED 

V7940 AV EXP EN PER OUT ST MD STUDENT UNMARRIED 

V7950 TUITTEXPEN RATIO-IN ST TO OUT ST 



(1987/1977) 
1978 

(1988/1978) 



xlOO 
xlOO 



1989 
1999 

(1999/1391) xlOO 

1985 

1995 

1995/1985 

1986 

1996 

1996/1986 

1987 

1997 

1997/1987 

1988 

1998 

1998/1988 



1984 

(1989/1984) xlOO 
1980 

(1985/1980) xlOO 
1981 

(1986/1981) xlOO 
1982 

(1987/1982) xlOO 
1983 

(1988/1983) xlOO 



1999/1391 
2036 

'< (2036/1391) xlOO 
2037 

i (2037/1391) xlOO 



. 1965 
i 1966 
■ 1969 
] 2039 
I 2043 

1965/1966 
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*** STUDENT SELECTION *** 

— YEAR— 

V8000 YR_ SELECTD**HS SR 73 

V8010 YR SELECTD-UHDERGRAD 

V8020 ¥R SELECTD-UNDERGRAD 

V8030 YR SELECTD - UND ERG RAD 

V8040 YR SELECTD-UNDERGRAD 

—APPLICANTS — 

V8100 AP PL-TOT 

V8110 APPL -TOT MALE 

V8115 APPL-PCT MALE TO TOT 

V8120 AP PL-TOT FEMALE 

V8130 RATIO-MALE APPL TO ENTERING 

V8140 RATIO-FEMALE APPL TO ENTERING 

V8150 RATIO-APPL TO ENTERING 

—STANDING — 

V8200 MC ACCEPT TRANS STUDENTS 

V8210 MC ACCEPT ADV STANDING STUDEtJtS 

*** CAREER REVIEW *** 

V8300 REVIEW CAREER CHOICE AT GRADUATION 

V8310 REVIEW CAREER CHOICE 5 YRS AFTER GRAD 73 



FR 74-75 
SOPH 74-75 
JR 74-75 
SR 74-75 



V8330 
V8340 



ADVIS PROG-STUDENT RETENTION 74-75 
CAREER INTENT AFFECTS ADMISS DECISION 



1280 
1281 
1282 
1283 
1284 



Division of 
Division of 
Division of 
Division of 
Division of 
Division of 
Division of 



Student 
Student 
Student 

Student 
Student 
Student 
Student 



Studies 
Studies 
Studies 

Studies 
Studies 
Studies 
Studies 



1286 
1311 



0438 
0439 

1318 
0441 



APPENDIX B 



19 CLUSTER SOLUTION 



9 CLUSTER SOLUTION 



CLUSTER H 



CLUSTER #2 



ARKANSAS 
LOUISVIL 
LA N ORL 
TENNESS 
MISS 

OKLAHOMA 
PR RICO 
MICH ST 

TX S ANT 
CONK 
RUTGERS 
S DAKOTA 
GEORGIA 
S CAROL 
TX GALV 
N CAROL 
U VIRGIN 
WISCONSIN 



CLUSTER U 



CLUSTER #3 



CLUSTER #4 



MC VIRG 

WAYNE ST 

VERMONT 

W VIRGIN 

MO COLUM 

ALABAMA 

UTAH 

CINCIN 

SUNY SYR 

KENTUCKY 

NEBRASKA 

SUNY BUP 

OREGON 

COLORADO 

N JERSEY 

ARIZONA 

CAL DAV 

FLORIDA 



CLUSTER #2 



CLUSTER #5 



CLUSTER #6 



ILLINOIS 
SUNY DST 
UCLA 

INDIANA 
OHIO ST 
U MICH 
WASH SEA 



CLUSTER #3 



CLUSTER #7 



CLUSTER #S 



CLUSTER #9 
CLUSTER #10 



CAL IRV 
RUSH 

STONY BRK 
OHIO TOL 
LA SHREV 
MASS 
SO FLA 
SO ILL 

NEW MEX 
MT SINAI 
CAL S DI 

NEVADA 

E VIRGINIA 



CLUSTER #4 



MAYO 



CLUSTER #5 
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19 CLUSTER SOLUTION 



CLUSTER #11 



U PENN 
CASE WST 
CORNELL 
SO CAL 



9 CLUSTER SOLUTION 



CLUSTER #12 



CLUSTER #13 



DARTMOUTH 
BROWN 

U CHICAGO 
J HOPKINS 
ROCHESTER 
VANDERBILT 



CLUSTER #6 



CLUSTER #14 



CLUSTER #15 



CLUSTER #16 



MIAMI 
TEMPLE 
CHICAGO MED 
M C PENN 
CREIGHTON 
ST LOUIS 
ALBANY 
BOWMAN GRAY 
PITTSBURGH 

GEO WASH 
NWE STERN 
HAH NEMAN 
HOWARD 
G TOWN 

M C WISCONSIN 
BOSTON 
JEFFERSON 
N Y MED 
LOYOLA 



CLUSTER #7 



TEX TECH 
MINN DUL 
S ALABAMA 



CLUSTER #8 



CLUSTER #17 



CLUSTER #10 



EINSTEIN 
N Y UNIV 
STANFORD 
WASH S L 
YALE 

CAL S F 
MINN MPS 



CLUSTER #9 



CLUSTER #19 



TX SWEST 
PENN ST 
DUKE 
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